Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I am trying to save a PDF file by saving the data from the FDF into a PDFTemplate, in my WPF application.
So, the situation is like this. I have a PDFTemplate.pdf which serves as a template and has placeholders (or fields). Now I generate this FDF file pro-grammatically, which in turn contain all the field names required for the PDFTemplate to be filled in. Also, this FDF contains the file path for the PDFTemaplte also, so that on opening, it knows which PDF to use.
Now, when try and double click on the FDF, it open the Adober Acrobat Reader and displays the PDFTemplate with the data filled in. But I can't save this file using the File menu, as it says this file will be saved without the data.
I would like to know if it is possible to import the FDF data into PDF and save it without using a thrid party component.
Also, if it is very difficult to do this, what would be the possible solution in terms of a free library that would be able to do it?
I just realized that iTextSharp is not free for commercial applications.
I have been able to achieve this using another library PDFSharp.
It is somewhat similar to how iTextSharp works except for some places where in iTextSharp is better and easier to use. I am posting the code in case someone would want to do something similar:
//Create a copy of the original PDF file from source
//to the destination location
File.Copy(formLocation, outputFileNameAndPath, true);
//Open the newly created PDF file
using (var pdfDoc = PdfSharp.Pdf.IO.PdfReader.Open(
outputFileNameAndPath,
PdfSharp.Pdf.IO.PdfDocumentOpenMode.Modify))
{
//Get the fields from the PDF into which the data
//is supposed to be inserted
var pdfFields = pdfDoc.AcroForm.Fields;
//To allow appearance of the fields
if (pdfDoc.AcroForm.Elements.ContainsKey("/NeedAppearances") == false)
{
pdfDoc.AcroForm.Elements.Add(
"/NeedAppearances",
new PdfSharp.Pdf.PdfBoolean(true));
}
else
{
pdfDoc.AcroForm.Elements["/NeedAppearances"] =
new PdfSharp.Pdf.PdfBoolean(true);
}
//To set the readonly flags for fields to their original values
bool flag = false;
//Iterate through the fields from PDF
for (int i = 0; i < pdfFields.Count(); i++)
{
try
{
//Get the current PDF field
var pdfField = pdfFields[i];
flag = pdfField.ReadOnly;
//Check if it is readonly and make it false
if (pdfField.ReadOnly)
{
pdfField.ReadOnly = false;
}
pdfField.Value = new PdfSharp.Pdf.PdfString(
fdfDataDictionary.Where(
p => p.Key == pdfField.Name)
.FirstOrDefault().Value);
//Set the Readonly flag back to the field
pdfField.ReadOnly = flag;
}
catch (Exception ex)
{
throw new Exception(ERROR_FILE_WRITE_FAILURE + ex.Message);
}
}
//Save the PDF to the output destination
pdfDoc.Save(outputFileNameAndPath);
pdfDoc.Close();
}
Related
I am using Microsoft.Office.Interop.Word to access Word documents through c#. Some of the Word documents have objects inside them. This is the equivalent of email attachments.
To insert some file in a Word document in Word 2007, you go to Insert -> Object -> Object... and select some file.
My question is, how do I get the file out using C#?
Here is an example of how it is done with an email using Outlook:
protected Microsoft.Office.Interop.Outlook.ApplicationClass outlookApplication = null;
protected Microsoft.Office.Interop.Outlook._MailItem mailItem = null;
protected Microsoft.Office.Interop.Outlook.NameSpace mapi = null;
public OutlookFileExtracter(string filename, string contentPrefix, int startAttachmentNumber)
this.outlookApplication = new Microsoft.Office.Interop.Outlook.ApplicationClass();
this.mapi = outlookApplication.GetNamespace("MAPI");
mailItem = mapi.OpenSharedItem(filename) as Microsoft.Office.Interop.Outlook._MailItem;
}
public Collection<string> GetFileNames()
{
String extension;
if (this.fileNamesOrig == null)
{
int numberOfFiles = this.mailItem.Attachments.Count;
this.fileNamesOrig = new Collection<string>();
this.fileNamesDest = new Collection<string>();
this.fileValidBools = new Collection<bool>();
for (int i = 0; i < numberOfFiles; i++)
{
//First attachment number is 1
fileNamesOrig.Add(this.mailItem.Attachments[i + 1].FileName);
this.fileValidBools.Add(false);
}
for (int la = 0; la < numberOfFiles; la++)
{
extension = Path.GetExtension(fileNamesOrig[la]).ToUpper().Trim('.');
this.fileNamesDest.Add(this.contentPrefix + (this.startAttachmentNumber + la) + "." + extension);
}
}
return this.fileNamesOrig;
}
Apparently the Microsoft.Office.Interop.Word doesn't use attachments, but then I don't know what it is called. Any ideas?
You may be referring to OLE, which is heavily used in Office documents. From the wikipedia article: http://en.wikipedia.org/wiki/Object_Linking_and_Embedding
Object Linking and Embedding (OLE) is a technology developed by Microsoft that allows embedding and linking to documents and other objects. For developers, it brought OLE Control eXtension (OCX), a way to develop and use custom user interface elements. On a technical level, an OLE object is any object that implements the IOleObject interface, possibly along with a wide range of other interfaces, depending on the object's needs.
That website will initially look to be unrelated to your question, however, It's what is being used.
If you want to skip the meat, scroll right down to the bottom where you'll find an 'external link' to: http://www.pldaniels.com/ripole/
ripOLE is a small program/library designed to pull out attachments from OLE2 data files (ie, MS Office documents). ripOLE is BSD licenced meaning that commercial projects can also use the code without worry of licence costs or legal liabilities.
You could try using the System.IO.Packaging classes to read the data. A Word 2007 file is just a zip file, so the objects you're after are probably inside in a format you can read.
There's a collection of articles on MSDN titled "Word 2007 Visual How Tos" that might be of some use:
http://msdn.microsoft.com/en-us/library/gg537324(v=office.12).aspx
You can read about the Open XML Format SDK here:
http://msdn.microsoft.com/en-us/library/bb448854(v=office.12).aspx
As said by Arafangion they are OLE objects, for most of them if you know what they are you could ask them to export their content somewhere else see Extract embedded document with the word document for other you may need to extract the binary content and hope that your user could find an application to read it.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a few general ideas on how I want to do this.
What I am trying to do is: create a front end CMS system, which is very simple, where a report will be generated from i.e. a template, using jQuery (drag, drop etc), included in the report will be placeholders where data will be imported into e.g. name, address etc. This data can be changed by different users who have access to the data.
I was thinking I would need to convert this HTML into xsl-fo format and then generate it into a PDF as xsl-fo will give me a major advantage on custom display of data on PDF, i.e. the data will appear how I want it to. This will also enable me to do a lookup in the xsl-fo using xslt (or something?) to import the latest updated database values. The tool to actually convert from xsl-fo into PDF that looks like it fits my bill is: fo.net. Ultimately I would need to use some code already out there but where I can avoid it, I would want to.
Keep in mind:
I need ultimate control over everything (eventually)
Free / open source alternatives that are flexible (with source code)
Questions:
Is jQuery the best thing to use for the CMS? As I will be having custom controls which will contain database data or placeholders for data to be imported into
Is XSL-FO the best intermediary language to port this template into for rendering/ converting into a PDF?
How do I convert html into xsl-fo? Does c#/.net have an API I can look at?
Have I overcomplicated things? Any simpler ways to do this?
Note
The HTML + CSS on the page may be very complicated/ flexible so I may need to use jQuery to add the CSS inline to the elements, hence why I am thinking of using XSL-FO as I may be able to generate tags that can read this data and place it on the PDF in a certain way, please keep this in mind when answering my question (if you choose to!) :)
I have found PDFsharp and MigraDoc to be great for pdf generation.
I have created a pdf utility...
using System;
using System.IO;
using System.Web;
using System.Web.Mvc;
using PdfSharp.Pdf;
//Controller for a PdfResult
namespace Web.Utilities
{
public class PdfResult : ActionResult
{
public String Filename { get; set; }
protected MemoryStream pdfStream = new MemoryStream();
public PdfResult(PdfDocument doc)
{
Filename = String.Format("{0}.pdf", doc.Info.Title);
doc.Save(pdfStream, false);
}
public PdfResult(String pdfpath)
{
/* optional if requried ToString save ToString file System */
throw new NotImplementedException("PdfResult is just an example and does not serve files from the filesystem.");
}
public override void ExecuteResult(ControllerContext context)
{
context.HttpContext.Response.Clear();
context.HttpContext.Response.ContentType = "application/pdf";
context.HttpContext.Response.AddHeader("Content-Disposition", "attachment; filename=" + Filename); // specify filename
context.HttpContext.Response.AddHeader("content-length", pdfStream.Length.ToString());
context.HttpContext.Response.BinaryWrite(pdfStream.ToArray());
context.HttpContext.Response.Flush();
pdfStream.Close();
context.HttpContext.Response.End();
}
}
}
And then you can render a view of pdf in the controller...
public ActionResult Download()
{
Document document = new Document();
document.Info.Title = "Hello";
Section section = document.AddSection();
section.AddParagraph("Hello").AddFormattedText("World", TextFormat.Bold);
PdfDocumentRenderer renderer = new PdfDocumentRenderer();
renderer.Document = document;
renderer.RenderDocument();
return new PdfResult(renderer.PdfDocument);
}
I have found this to be a really neat and easy to control method of putting pdf into mvc.
To answer my own question, I have decided to use Fo.NET, a C# implementation of Fop.Net by Apache. I will generate my XML file on the fly, then transform this document into an XSL:Fo xml file then send to create a PDF.
I have managed to do this quite successfully, this will enable me to throw out Fo.Net in the future and get another software or even write my own if needed. Hopefully over the next few months I will have a firmer answer to how flexible my choice actually was. :)
I will handle the front end with jQuery and jQuery UI.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Looking at other posts for this could not find an adequate solution that for my needs. Trying to just get the first page of a pdf document as a thumbnail. This is to be run as a server application so would not want to write out a pdf document to file to then call a third application that reads the pdf to generate the image on disk.
doc = new PDFdocument("some.pdf");
page = doc.page(1);
Image image = page.image;
Thanks.
Matthew Ephraim released an open source wrapper for Ghostscript that sounds like it does what you want and is in C#.
Link to Source Code: https://github.com/mephraim/ghostscriptsharp
Link to Blog Posting: http://www.mattephraim.com/blog/2009/01/06/a-simple-c-wrapper-for-ghostscript/
You can make a simple call to the GeneratePageThumb method to generate a thumbnail (or use GeneratePageThumbs with a start and end page number to generate thumbnails for multiple seperate pages, with each page being a seperate output file), default file format is jpeg but you can change it, and many other options, by using the alternate GenerateOutput method call and specify options such as file format, page size, etc...
I think that Windows API Code pack for Microsoft .NET framework might do the trick easiest. What it can is to generate the same thumbnail that Windows Explorer does (and that is first page), and you can chose several sizes, they go up to 1024x1024, so it should be enough. It is quite simple, just create ShellObject.FromParsingName(filepath) and find its Thumbnail subclass.
The problem might be what your server is. This works on Windows 7, Windows Vista and I guess Windows Server 2008. Also, Windows Explorer must be able to show thumbnails on that machine. The easiest way to insure that is to install Adobe Reader. If all of this is not a problem, I think that this is the most elegant way.
UPDATE: Adobe Reader has dropped support for thumbnails in the recent versions so its legacy versions must be used.
UPDATE2: According to comment from Roberto, you can still use latest version of Adobe Reader if you turn on thumbnails option in Edit - Preferences - General.
Download PDFLibNet and use the following code
public void ConvertPDFtoJPG(string filename, String dirOut)
{
PDFLibNet.PDFWrapper _pdfDoc = new PDFLibNet.PDFWrapper();
_pdfDoc.LoadPDF(filename);
for (int i = 0; i < _pdfDoc.PageCount; i++)
{
Image img = RenderPage(_pdfDoc, i);
img.Save(Path.Combine(dirOut, string.Format("{0}{1}.jpg", i,DateTime.Now.ToString("mmss"))));
}
_pdfDoc.Dispose();
return;
}
public Image RenderPage(PDFLibNet.PDFWrapper doc, int page)
{
doc.CurrentPage = page + 1;
doc.CurrentX = 0;
doc.CurrentY = 0;
doc.RenderPage(IntPtr.Zero);
// create an image to draw the page into
var buffer = new Bitmap(doc.PageWidth, doc.PageHeight);
doc.ClientBounds = new Rectangle(0, 0, doc.PageWidth, doc.PageHeight);
using (var g = Graphics.FromImage(buffer))
{
var hdc = g.GetHdc();
try
{
doc.DrawPageHDC(hdc);
}
finally
{
g.ReleaseHdc();
}
}
return buffer;
}
I used to do this kind of stuff with imagemagick (Convert) long ago.
There is a .Net Wrapper for that, maybe it's worth checking out :
http://imagemagick.codeplex.com/releases/view/30302
http://www.codeproject.com/KB/cs/GhostScriptUseWithCSharp.aspx
This works very well. The only dependencies are GhostScript's gsdll32.dll (you need to download GhostScript separately to get this, but there is no need to have GhostScript installed in your production environment), and PDFSharp.dll which is included in the project.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I have found various code and libraries for editing Exif.
But they are only lossless when the image width and height is multiple of 16.
I am looking for a library (or even a way to do it myself) to edit just the Exif portion in a JPEG file (or add Exif data if it doesn't exist yet), leaving the other data unmodified. Isn't that possible?
So far I could only locate the Exif portion (starts with 0xFFE1) but I don't understand how to read the data.
Here are the specifications for the Exif interchange format, if you plan to code your own library for editing tags.
http://www.exif.org/specifications.html
Here's a library written in Perl that meets your needs that you may be able to learn from:
http://www.sno.phy.queensu.ca/~phil/exiftool/
Here's a decent .NET library for Exif evaluation from The Code Project:
http://www.codeproject.com/KB/graphics/exiftagcol.aspx
You can do this without any external lib:
// Create image.
Image image1 = Image.FromFile("c:\\Photo1.jpg");
// Get a PropertyItem from image1. Because PropertyItem does not
// have public constructor, you first need to get existing PropertyItem
PropertyItem propItem = image1.GetPropertyItem(20624);
// Change the ID of the PropertyItem.
propItem.Id = 20625;
// Set the new PropertyItem for image1.
image1.SetPropertyItem(propItem);
// Save the image.
image1.Save("c:\\Photo1.jpg", ImageFormat.Jpg);
List of all possible PropertyItem ids (including exif) you can found here.
Update: Agreed, this method will re-encode image on save. But I have remembered another method, in WinXP SP2 and later there is new imaging components added - WIC, and you can use them to lossless write metadate - How-to: Re-encode a JPEG Image with Metadata.
exiv2net library (a .NET wrapper on top of exiv2) may be what you're looking for.
I wrote a small test where I compress one file many times to see the quality degradation and you can see it in the third-fourth compression, which is very bad.
But luckily, if you always use same QualityLevel with JpegBitmapEncoder there is no degradation.
In this example I rewrite keywords 100x in metadata and the quality seems not to change.
private void LosslessJpegTest() {
var original = "d:\\!test\\TestInTest\\20150205_123011.jpg";
var copy = original;
const BitmapCreateOptions createOptions = BitmapCreateOptions.PreservePixelFormat | BitmapCreateOptions.IgnoreColorProfile;
for (int i = 0; i < 100; i++) {
using (Stream originalFileStream = File.Open(copy, FileMode.Open, FileAccess.Read)) {
BitmapDecoder decoder = BitmapDecoder.Create(originalFileStream, createOptions, BitmapCacheOption.None);
if (decoder.CodecInfo == null || !decoder.CodecInfo.FileExtensions.Contains("jpg") || decoder.Frames[0] == null)
continue;
BitmapMetadata metadata = decoder.Frames[0].Metadata == null
? new BitmapMetadata("jpg")
: decoder.Frames[0].Metadata.Clone() as BitmapMetadata;
if (metadata == null) continue;
var keywords = metadata.Keywords == null ? new List<string>() : new List<string>(metadata.Keywords);
keywords.Add($"Keyword {i:000}");
metadata.Keywords = new ReadOnlyCollection<string>(keywords);
JpegBitmapEncoder encoder = new JpegBitmapEncoder {QualityLevel = 80};
encoder.Frames.Add(BitmapFrame.Create(decoder.Frames[0], decoder.Frames[0].Thumbnail, metadata,
decoder.Frames[0].ColorContexts));
copy = original.Replace(".", $"_{i:000}.");
using (Stream newFileStream = File.Open(copy, FileMode.Create, FileAccess.ReadWrite)) {
encoder.Save(newFileStream);
}
}
}
}
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I need to read from Outlook .MSG file in .NET without using COM API for Outlook (cos it will not be installed on the machines that my app will run). Are there any free 3rd party libraries to do that? I want to extract From, To, CC and BCC fields. Sent/Receive date fields would be good if they are also stored in MSG files.
There is code avaliable on CodeProject for reading .msg files without COM. See here.
Update: I have found a 3rd party COM library called Outlook Redemption which is working fine for me at the moment. If you use it via COM-Interop in .NET, don't forget to release every COM object after you are done with it, otherwise your application crashes randomly.
Here's some sample VBA code using Outlook Redemption that Huseyint found.
Public Sub ProcessMail()
Dim Sess As RDOSession
Dim myMsg As RDOMail
Dim myString As String
Set Sess = CreateObject("Redemption.RDOSession")
Set myMsg = Sess.GetMessageFromMsgFile("C:\TestHarness\kmail.msg")
myString = myMsg.Body
myMsg.Body = Replace(myString, "8750", "XXXX")
myMsg.Save
End Sub
Microsoft has documented this: .MSG File Format Specification
It's a "Structured Storage" document. I've successfully used Andrew Peace's code to read these in the past, even under .NET (using C++/CLI) - it's clean and fairly easy to understand. Basically, you need to figure out which records you need, and query for those - it gets a little bit hairy, since different versions of Outlook and different types of messages will result in different records...
You can try our (commercial) Rebex Secure Mail library. It can read Outlooks MSG format. Following code shows how:
// Load message
MailMessage message = new MailMessage();
message.Load(#"c:\Temp\t\message.msg");
// show From, To and Sent date
Console.WriteLine("From: {0}", message.From);
Console.WriteLine("To: {0}", message.To);
Console.WriteLine("Sent: {0}", message.Date.LocalTime);
// find and try to parse the first 'Received' header
MailDateTime receivedDate = null;
string received = message.Headers.GetRaw("Received");
if (received != null)
{
int lastSemicolon = received.LastIndexOf(';');
if (lastSemicolon >= 0)
{
string rawDate = received.Substring(lastSemicolon + 1);
MimeHeader header = new MimeHeader("Date", rawDate);
receivedDate = header.Value as MailDateTime;
}
}
// display the received date if available
if (receivedDate != null)
Console.WriteLine("Received: {0}", receivedDate.LocalTime);
More info on Sent and Received dates and how are they represented in the message can be found at http://forum.rebex.net/questions/816/extract-senttime-receivetime-and-time-zones
If you open the .MSG file in a text editor, i believe you will find that the information you are after is stored as plain text inside the file. (It is on all the messages i have checked at least)
It would be pretty easy to write some code to parse the file looking for lines beginning with "From:" or "To:" etc. and then extracting the information you need.
If you need the body of the email as well, that may be a bit more complicated.