When converting from docx to html you may specify the output path for any images
org.docx4j.Docx4J.toHTML(wordMLPackage, imageDirPath, imageTargetUri, fos2);
and the resulting html document references images via files:
<img height="22" id="rId7" src="..cc6bcedf-2770-45ad-8e81-610bbd8746ceimage1.png" width="42">
Instead I would like the converter to embed the files as base64. Is this possible?
You can write your own ConversionImageHandler implementation to do that.
The default implementation HTMLConversionImageHandler writes images to files.
To use your image handler, specify it via htmlSettings.setImageHandler
You do not need a custom ConversionImageHandler to achieve this.
You can simply set imageDirPath to an empty string and the images will be embedded
org.docx4j.Docx4J.toHTML(wordMLPackage, "", "", fos2);
This occurs because org.docx4j.model.images.AbstractConversionImageHandler (from which HTMLConversionImageHandler derives) already handles this case for you .
Related
Basiclly I'm trying to create an HTML, I already have it written but I want the user to be able to put some text on the textboxes and saving it into strings and use later when creating the HTML file.
I tried playing abit with StreamWriter but I don't think that will be the best idea.
Also I want it to open on the default web browser , or just on IE if it's easier after the file is created.
I really need help as I'm struggling especially with the creating part.
Thanks for reading!
You can also do this without external libraries.
Set up your HTML file as follows:
<!DOCTYPE html>
<html>
<header>
<title>{MY_TITLE}</title>
</header>
<body></body>
</html>
Then edit and save the HTML from C#:
const string fileName = "Foobar.html";
//Read HTML from file
var content = File.ReadAllText(fileName);
//Replace all values in the HTML
content = content.Replace("{MY_TITLE}", titleTextBox.Text);
//Write new HTML string to file
File.WriteAllText(fileName, content);
//Show it in the default application for handling .html files
Process.Start(fileName);
If you already have the HTML you want to export (just not customized), you could manually add format strings to it (like {0}, {1}, {2}) where you want to substitute text from your app, then embed it as a resource, load it in at runtime, substitute the TextBox text using string.Format, and finally write it out again. This is admittedly a really fragile way to do it, as you need to make sure the number of parameters agrees between the resource file and your call to string.Format. In fact, this is a horrible way to do it. Actually, you should do it the way #EmilePels suggests, which is basically a less fragile version of this answer.
I included an image as a resource following this post:
How to create and use resources in .NET
I am using PDFSharp library to create a PDF. The method to draw an image, requires the path of the image. How do I get the path of Properties.Resources.Image?
Or is there another way to do this?
The Properties.Resources.Image is in-memory resource.
You can save Image to temp file and the get the path.
var path = Path.GetTempPath();
Properties.Resources.logo.Save(path);
Above uses Bitmap.Save
You can actually create an image, without saving it, using XImage.FromGdiPlusImage():
var image = XImage.FromGdiPlusImage(Properties.Resources.logo);
As of PDFsharp/MigraDoc 1.50 beta 2 and newer you no longer need a path when using MigraDoc. It was already mentioned that PDFsharp does not need a filename, as images can be read from e.g. streams.
MigraDoc still requires a string. You encode the image data as a string (BASE64 format) and pass that string as a filename.
See also:
http://pdfsharp.net/wiki/MigraDoc_FilelessImages.ashx
I want to know how to create an image object having the "src info from an email". I already manage to get read the inbox, and to parse the html of it, and get out all of the "src = foo" from all the images in the email. My question is how do I then proceed to create an image using the information taken out from "src" in the of the html. I need this object in order to store it in a sharepoint picture library. Just want to know how to create the image object of the image stored in the html of the email.
Not sure about how to put it in SharePoint, but assuming you have a src in an extractedSrc variable:
WebClient webClient = new WebClient();
webClient.DownloadFile(extractedSrc, localFileName)
I guess there are two basic cases you have to consider, 1. The src attribute points to an external image (ie. image stored on a web site), 2. Src points to an image attached in the email.
For case 1. You need to download the image from the external server and then you can save it in your share point
For case 2. You have to decode the attachment sections of the email to extract the file data and then you can save it to your library
In my scenario I want to download the HTML of a page (any page on the Internet) programaticaly but also I want all of the images in the HTML to be in base64 embedded format (not referenced)
In other words, instead of :
<img src='/images/delete.gif' />
I want the downloaded html to look like this:
<img src="data:image/gif;base64,R0lGODl..." />
This way I don't need to go through the process of storing all images in directories, etc, etc.
Does any of you have any idea how this can be done? Or any plugin to do this efficiently?
Well, you'd need to:
Download the original HTML
Find each img element in the HTML (for instance using the HTML agility pack) and for each one:
If it's already using a data URL, ignore it
Otherwise:
Download the image
Encoded it in Base64 using Convert.ToBase64String
Replace the original img tag with one using the base64 version (either in the original string, or via a DOM representation)
Save the final HTML to disk
Is any of these steps causing you a particular problem? You could potentially make it quicker by downloading the images in parallel, but I'd get a serial version working first.
Instead of using a html page with images as base64 encoded strings in the src attribute you might consider using the MHTML format instead. Most browsers supports the format and it embeds all external resources (including images).
var msg = new CDO.MessageClass();
msg.MimeFormatted = true;
msg.CreateMHTMLBody("http://www.google.com", CDO.CdoMHTMLFlags.cdoSuppressNone, "", "");
var stream = msg.GetStream();
var mhtml = stream.ReadText(stream.Size);
Use a regular expression (regex) to extract URLs from img tags, translate them to absolute URLs using the Uri class, then use WebClient to download the target images. After that it's just a case of using Convert.ToBase64String to produce the Base64.
I store emails and their attachments in a database. I'm using a WPF WebBrowser and the NavigateToString method to display the html body of emails. It works but when emails use embedded images with a content id (cid), i can't display them. I saved all embedded images as attachments when i save emails in database. I could create and store images in temporary files of the current user and replace cid references with an absolute path on user's disk but i think it's not the best way...
Have you got some ideas ?
I finally found a good way :
I replaced the cid references of all images with base64 image data (RFC 2557) like this :
<img src="data:image/png;base64,RAAAtuhhx4dbgYKAAA7...more data....." alt="test">
You can use the following code to generate the base64 string :
string base64Str = Convert.ToBase64String(File.ReadAllBytes(#"C:\Temp\test.png"));
Remarks : doesn't work with IE6