JPG to PDF Convertor in C# - c#

I would like to convert from an image (like jpg or png) to PDF.
I've checked out ImageMagickNET, but it is far too complex for my needs.
What other .NET solutions or code are there for converting an image to a PDF?

Easy with iTextSharp:
class Program
{
static void Main(string[] args)
{
Document document = new Document();
using (var stream = new FileStream("test.pdf", FileMode.Create, FileAccess.Write, FileShare.None))
{
PdfWriter.GetInstance(document, stream);
document.Open();
using (var imageStream = new FileStream("test.jpg", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
var image = Image.GetInstance(imageStream);
document.Add(image);
}
document.Close();
}
}
}

iTextSharp does it pretty cleanly and is open source. Also, it has a very good accompanying book by the author which I recommend if you end up doing more interesting things like managing forms. For normal usage, there are plenty resources on mailing lists and newsgroups for samples of how to do common things.
EDIT: as alluded to in #Chirag's comment, #Darin's answer has code that definitely compiles with current versions.
Example usage:
public static void ImagesToPdf(string[] imagepaths, string pdfpath)
{
using(var doc = new iTextSharp.text.Document())
{
iTextSharp.text.pdf.PdfWriter.GetInstance(doc, new FileStream(pdfpath, FileMode.Create));
doc.Open();
foreach (var item in imagepaths)
{
iTextSharp.text.Image image = iTextSharp.text.Image.GetInstance(item);
doc.Add(image);
}
}
}

Another working code, try it
public void ImagesToPdf(string[] imagepaths, string pdfpath)
{
iTextSharp.text.Rectangle pageSize = null;
using (var srcImage = new Bitmap(imagepaths[0].ToString()))
{
pageSize = new iTextSharp.text.Rectangle(0, 0, srcImage.Width, srcImage.Height);
}
using (var ms = new MemoryStream())
{
var document = new iTextSharp.text.Document(pageSize, 0, 0, 0, 0);
iTextSharp.text.pdf.PdfWriter.GetInstance(document, ms).SetFullCompression();
document.Open();
var image = iTextSharp.text.Image.GetInstance(imagepaths[0].ToString());
document.Add(image);
document.Close();
File.WriteAllBytes(pdfpath+"cheque.pdf", ms.ToArray());
}
}

One we've had great luck with is PDFSharp (we use it for TIFF and Text to PDF conversion for hundreds of medical claims every day).
http://pdfsharp.com/PDFsharp/

Such task can be easily done with help of Docotic.Pdf library.
Here is a sample that creates PDF from given images (not only JPGs, actually):
public static void imagesToPdf(string[] images, string pdfName)
{
using (PdfDocument pdf = new PdfDocument())
{
for (int i = 0; i < images.Length; i++)
{
if (i > 0)
pdf.AddPage();
PdfPage page = pdf.Pages[i];
string imagePath = images[i];
PdfImage pdfImage = pdf.AddImage(imagePath);
page.Width = pdfImage.Width;
page.Height = pdfImage.Height;
page.Canvas.DrawImage(pdfImage, 0, 0);
}
pdf.Save(pdfName);
}
}
Disclaimer: I work for the vendor of the library.

If you want to do it in a cross-platform way, without any thirty part library,
or paying any license, you can use this code.
It takes an array of pictures (I think it only works only with jpg) with its sizes and return a pdf file, with one picture per page.
You have to create two files:
File Picture:
using System;
using System.Collections.Generic;
using System.Text;
namespace PDF
{
public class Picture
{
private byte[] data;
private int width;
private int height;
public byte[] Data { get => data; set => data = value; }
public int Width { get => width; set => width = value; }
public int Height { get => height; set => height = value; }
}
}
File PDFExport:
using System;
using System.Collections.Generic;
namespace PDF
{
public class PDFExport
{
private string company = "Your Company Here";
public sbyte[] createFile(List<Picture> pictures)
{
int N = (pictures.Count + 1) * 3;
string dateTimeStr = DateTime.Now.ToString("yyyyMMddhhmmss");
string file1 =
"%PDF-1.4\n";
string file2 =
"2 0 obj\n" +
"<<\n" +
"/Type /Pages\n" +
getKids(pictures) +
"/Count " + pictures.Count + "\n" +
">>\n" +
"endobj\n" +
"1 0 obj\n" +
"<<\n" +
"/Type /Catalog\n" +
"/Pages 2 0 R\n" +
"/PageMode /UseNone\n" +
"/PageLayout /SinglePage\n" +
"/Metadata 7 0 R\n" +
">>\n" +
"endobj\n" +
N + " 0 obj\n" +
"<<\n" +
"/Creator(" + company + ")\n" +
"/Producer(" + company + ")\n" +
"/CreationDate (D:" + dateTimeStr + ")\n" +
"/ModDate (D:" + dateTimeStr + ")\n" +
">>\n" +
"endobj\n" +
"xref\n" +
"0 " + (N + 1) + "\n" +
"0000000000 65535 f\n" +
"0000224088 00000 n\n" +
"0000224031 00000 n\n" +
"0000000015 00000 n\n" +
"0000222920 00000 n\n" +
"0000222815 00000 n\n" +
"0000224153 00000 n\n" +
"0000223050 00000 n\n" +
"trailer\n" +
"<<\n" +
"/Size " + (N + 1) + "\n" +
"/Root 1 0 R\n" +
"/Info 6 0 R\n" +
">>\n" +
"startxref\n" +
"0\n" +
"%% EOF";
sbyte[] part1 = file1.GetBytes();
sbyte[] part2 = file2.GetBytes();
List<sbyte[]> fileContents = new List<sbyte[]>();
fileContents.Add(part1);
for (int i = 0; i < pictures.Count; i++)
{
fileContents.Add(getPageFromImage(pictures[i], i));
}
fileContents.Add(part2);
return getFileContent(fileContents);
}
private string getKids(List<Picture> pictures)
{
string kids = "/Kids[";
for (int i = 0; i < pictures.Count; i++)
{
kids += (3 * (i + 1) + 1) + " 0 R ";
}
kids += "]\n";
return kids;
}
private sbyte[] getPageFromImage(Picture picture, int P)
{
int N = (P + 1) * 3;
string imageStart =
N + " 0 obj\n" +
"<<\n" +
"/Type /XObject\n" +
"/Subtype /Image\n" +
"/Width " + picture.Width + "\n" +
"/Height " + picture.Height + "\n" +
"/BitsPerComponent 8\n" +
"/ColorSpace /DeviceRGB\n" +
"/Filter /DCTDecode\n" +
"/Length " + picture.Data.Length + "\n" +
">>\n" +
"stream\n";
string dimentions = "q\n" +
picture.Width + " 0 0 " + picture.Height + " 0 0 cm\n" +
"/X0 Do\n" +
"Q\n";
string imageEnd =
"\nendstream\n" +
"endobj\n" +
(N + 2) + " 0 obj\n" +
"<<\n" +
"/Filter []\n" +
"/Length " + dimentions.Length + "\n" +
">>\n" +
"stream\n";
string page =
"\nendstream\n" +
"endobj\n" +
(N + 1) + " 0 obj\n" +
"<<\n" +
"/Type /Page\n" +
"/MediaBox[0 0 " + picture.Width + " " + picture.Height + "]\n" +
"/Resources <<\n" +
"/XObject <<\n" +
"/X0 " + N + " 0 R\n" +
">>\n" +
">>\n" +
"/Contents 5 0 R\n" +
"/Parent 2 0 R\n" +
">>\n" +
"endobj\n";
List<sbyte[]> fileContents = new List<sbyte[]>();
fileContents.Add(imageStart.GetBytes());
fileContents.Add(byteArrayToSbyteArray(picture.Data));
fileContents.Add(imageEnd.GetBytes());
fileContents.Add(dimentions.GetBytes());
fileContents.Add(page.GetBytes());
return getFileContent(fileContents);
}
private sbyte[] byteArrayToSbyteArray(byte[] data)
{
sbyte[] data2 = new sbyte[data.Length];
for (int i = 0; i < data2.Length; i++)
{
data2[i] = (sbyte)data[i];
}
return data2;
}
private sbyte[] getFileContent(List<sbyte[]> fileContents)
{
int fileSize = 0;
foreach (sbyte[] content in fileContents)
{
fileSize += content.Length;
}
sbyte[] finaleFile = new sbyte[fileSize];
int index = 0;
foreach (sbyte[] content in fileContents)
{
for (int i = 0; i < content.Length; i++)
{
finaleFile[index + i] = content[i];
}
index += content.Length;
}
return finaleFile;
}
}
}
You can use the code in this easy way
///////////////////////////////////////Export PDF//////////////////////////////////////
private sbyte[] exportPDF(List<Picture> images)
{
if (imageBytesList.Count > 0)
{
PDFExport pdfExport = new PDFExport();
sbyte[] fileData = pdfExport.createFile(images);
return fileData;
}
return null;
}

You need Acrobat to be installed. Tested on Acrobat DC. This is a VB.net code. Due to that these objects are COM objects, you shall do a 'release object', not just a '=Nothing". You can convert this code here: https://converter.telerik.com/
Private Function ImageToPDF(ByVal FilePath As String, ByVal DestinationFolder As String) As String
Const PDSaveCollectGarbage As Integer = 32
Const PDSaveLinearized As Integer = 4
Const PDSaveFull As Integer = 1
Dim PDFAVDoc As Object = Nothing
Dim PDFDoc As Object = Nothing
Try
'Check destination requirements
If Not DestinationFolder.EndsWith("\") Then DestinationFolder += "\"
If Not System.IO.Directory.Exists(DestinationFolder) Then Throw New Exception("Destination directory does not exist: " & DestinationFolder)
Dim CreatedFile As String = DestinationFolder & System.IO.Path.GetFileNameWithoutExtension(FilePath) & ".pdf"
'Avoid conflicts, therefore previous file there will be deleted
If File.Exists(CreatedFile) Then File.Delete(CreatedFile)
'Get PDF document
PDFAVDoc = GetPDFAVDoc(FilePath)
PDFDoc = PDFAVDoc.GetPDDoc
If Not PDFDoc.Save(PDSaveCollectGarbage Or PDSaveLinearized Or PDSaveFull, CreatedFile) Then Throw New Exception("PDF file cannot be saved: " & PDFDoc.GetFileName())
If Not PDFDoc.Close() Then Throw New Exception("PDF file could not be closed: " & PDFDoc.GetFileName())
PDFAVDoc.Close(1)
Return CreatedFile
Catch Ex As Exception
Throw Ex
Finally
System.Runtime.InteropServices.Marshal.ReleaseComObject(PDFDoc)
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(PDFDoc)
PDFDoc = Nothing
System.Runtime.InteropServices.Marshal.ReleaseComObject(PDFAVDoc)
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(PDFAVDoc)
PDFAVDoc = Nothing
GC.Collect()
GC.WaitForPendingFinalizers()
GC.Collect()
End Try
End Function

not sure if you're looking for just free / open source solutions or considering commercial ones as well. But if you're including commercial solutions, there's a toolkit called EasyPDF SDK that offers an API for converting images (plus a number of other file types) to PDF. It supports C# and can be found here:
http://www.pdfonline.com/
The C# code would look as follows:
Printer oPrinter = new Printer();
ImagePrintJob oPrintJob = oPrinter.ImagePrintJob;
oPrintJob.PrintOut(imageFile, pdfFile);
To be fully transparent, I should disclaim that I do work for the makers of EasyPDF SDK (hence my handle), so this suggestion is not without some personal bias :) But feel free to check out the eval version if you're interested. Cheers!

I use Sautinsoft, its very simple:
SautinSoft.PdfMetamorphosis p = new SautinSoft.PdfMetamorphosis();
p.Serial="xxx";
p.HtmlToPdfConvertStringToFile("<html><body><img src=\""+filename+"\"></img></body></html>","output.pdf");

You may try to convert any Images to PDF using this code sample:
PdfVision v = new PdfVision();
ImageToPdfOptions options = new ImageToPdfOptions();
options.JpegQuality = 95;
try
{
v.ConvertImageToPdf(new string[] {inpFile}, outFile, options);
System.Diagnostics.Process.Start(new System.Diagnostics.ProcessStartInfo(outFile) { UseShellExecute = true });
}
catch (Exception ex)
{
Console.WriteLine($"Error: {ex.Message}");
Console.ReadLine();
}
Or if you need to convert Image Class to PDF:
System.Drawing.Image image = Image.FromFile(#"..\..\image-jpeg.jpg");
string outFile = new FileInfo(#"Result.pdf").FullName;
PdfVision v = new PdfVision();
ImageToPdfOptions options = new ImageToPdfOptions();
options.PageSetup.PaperType = PaperType.Auto;
byte[] imgBytes = null;
using (MemoryStream ms = new System.IO.MemoryStream())
{
image.Save(ms, System.Drawing.Imaging.ImageFormat.Jpeg);
imgBytes = ms.ToArray();
}
try
{
v.ConvertImageToPdf(imgBytes, outFile, options);
System.Diagnostics.Process.Start(new System.Diagnostics.ProcessStartInfo(outFile) { UseShellExecute = true });
}
catch (Exception ex)
{
Console.WriteLine($"Error: {ex.Message}");
Console.ReadLine();
}

Many diff tools out there. One I use is PrimoPDF (FREE) http://www.primopdf.com/ you go to print the file and you print it to pdf format onto your drive. works on Windows

Related

Splite PDF by Size using IText7

I have found a inbuilt method to split pdf by size, however i am not sure about the parameter as it is asking for parameter to split the pdf (long size) and not sure using the parameter the split file in kb, mb. Can someone please suggest me the usage of this parameter, support i want to split the file of 95MB into 5MB files, so what value i need to pass as parameter i.e. 5000, 500, 50 or 5.
Code:
PdfReader reader = new PdfReader(fileName);
PdfDocument sourceDoc = new PdfDocument(reader);
IList<PdfDocument> pdfDocs = new CustomPdfSplitter(sourceDoc, outputFileName).SplitBySize(fileSize);
Code for CustomPdfSplitter:
private class CustomPdfSplitter : PdfSplitter
{
private String dest;
private int partNumber = 1;
public CustomPdfSplitter(PdfDocument pdfDocument, String dest) : base(pdfDocument)
{
this.dest = dest;
}
protected override PdfWriter GetNextPdfWriter(PageRange documentPageRange)
{
return new PdfWriter(dest.Replace(".pdf","_" + (partNumber++).ToString("0000")+".pdf"));
}
}
public string SplitPDFBySize(string fileName, string outputFileName, int fileSize)
{
StringBuilder outputMsg = new StringBuilder();
PdfReader reader = new PdfReader(fileName);
PdfDocument sourceDoc = new PdfDocument(reader);
long fileSizeinByte = fileSize * 1000000;
IList<PdfDocument> pdfDocs = new CustomPdfSplitter(sourceDoc, outputFileName).SplitBySize(fileSizeinByte);
try
{
PBChild.Value = 0;
PBChild.Maximum = pdfDocs.Count;
for (int i = 0; i < pdfDocs.Count; i++)
{
string splittedFile = outputFileName.Replace(".pdf", "_" + (i + 1).ToString("0000") + ".pdf");
pdfDocs[i].Close();
outputMsg.AppendLine(fileName + " splitted with output file: " + splittedFile);
PBChild.Value++;
}
}
catch (Exception ex)
{
outputMsg.AppendLine("Error: " + fileName + "||" + ex.Message);
}
finally
{
sourceDoc.Close();
}
return outputMsg.ToString();
}
SplitBySize take as parameter size specified in bytes. So, to split document into 5 MB documents you need to pass 5*2^20 as parameter.
Although, you're right that the documentation was slightly uninformative. We've already fixed that and you can find these fixes in the SNAPSHOT version.

out of memory exception while drawing image in asp.net

We are drawing and editing pdf by using the following code,but when we play with 11mb size file,we are getting out of memory exception,can we fix the issue,I had used System.Drawing.Image.FromFile instead of FromStream but no luck..
public static string ImageCar()
{
string FileName = HttpContext.Current.Session["filename"].ToString();
Document doc = new Document(FileName);
ArrayList arrFiles = new ArrayList();
string strFileName = "";
for (int pageCount = 1; pageCount <= TotalPages; pageCount++)
{
using (FileStream imageStream = new FileStream(HttpContext.Current.Server.MapPath("Input/image_" + strDateTime + "_" + pageCount + ".png"), FileMode.Create, FileAccess.ReadWrite))
{
strFileName = HttpContext.Current.Server.MapPath("Path" + strDateTime + "_" + pageCount + ".png");
arrFiles.Add(strFileName);
PngDevice pngDevice = new PngDevice();
//Convert a particular page and save the image to stream
pngDevice.Process(doc.Pages[pageCount], imageStream);
using (System.Drawing.Image image = System.Drawing.Image.FromStream(imageStream))
{
ScaleImage(image, 1189, 835, HttpContext.Current.Server.MapPath("Input/image1_" + strDateTime + "_" + pageCount + ".png"), out height, out Aratio);
image.Dispose();
imageStream.Close();
if (pageCount == 1)
fields = CheckFields(doc, pageCount, "image1_" + strDateTime + "_" + pageCount + ".png", fields, Convert.ToDouble(Aratio), licensed);
pages = pages + "," + "image1_" + strDateTime + "_" + pageCount + ".png";
Ratios = Ratios + "," + Aratio;
Allheights = Allheights + "," + height;
// Delete file from image folder
try
{
if (File.Exists(strFileName))
{
File.Delete(strFileName);
}
}
catch (Exception ex)
{
}
}
}
}
Ratios = Ratios.Substring(1, Ratios.Length - 1);
pages = pages.Substring(1, pages.Length - 1);
Allheights = Allheights.Substring(1, Allheights.Length - 1);
if (fields != "")
{
fields = fields.Substring(3, fields.Length - 3);
}
return pages + "%#" + Ratios + "%#" + Allheights + "%#" + fields;
}
System.Drawing has known memory leaks in a ASP site, and should never be used.
It will eventually case OOM errors, and while you can avoid it by restarting the server every so often, or throwing more memory at it, there is no real, long term fix. Use some other library for the image manipulation portion, and you should have less issues.
See the bottom of this page:
https://msdn.microsoft.com/en-us/library/system.drawing(v=vs.110).aspx
Classes within the System.Drawing namespace are not supported for use within a Windows or ASP.NET service. Attempting to use these classes from within one of these application types may produce unexpected problems, such as diminished service performance and run-time exceptions. For a supported alternative, see Windows Imaging Components.

Improve the speed of this process of Converting multiple PDF to multiple tif images using GdPicture.NET

I am Converting multiple PDF to multiple tif images using GdPicture.NET. (using this sample code in a Windows Forms Application)
I need to improve the speed of this process to suit it for thousands of PDF files.
Below is a sample method I used to implement a threading. However this mix the pdf pages.
public void ThreadRun(string pdFilFullName, string batchDir){
GdPictureStatus status = new GdPictureStatus();
GdPictureImaging oGdPictureImaging = new GdPictureImaging();
GdPicturePDF oGdPicturePDF = new GdPicturePDF();
status = oGdPicturePDF.LoadFromFile(pdFilFullName, false);
for (int i = 1; i <= oGdPicturePDF.GetPageCount(); i++)
{
//select page
oGdPicturePDF.SelectPage(i);
//render selected page to GdPictureImage identifier
int rasterizedPageID = oGdPicturePDF.RenderPageToGdPictureImageEx(200.0f, true);
if (i == 1 || i < 10)
{
padding = "00";
}
else if (i == 10 || i < 100)
{
padding = "0";
}
else
{
padding = string.Empty;
}
//Set Image file name
filePath = batchDir + "\\" + padding + i + ".tif";
// Converting to black and White
oGdPictureImaging.FxBlackNWhite(rasterizedPageID, BitonalReduction.Stucki);
// Converting to Single pixel
oGdPictureImaging.ConvertTo1BppAT(rasterizedPageID);
// Saving each page of the PDF file to single TIFF image
status = oGdPictureImaging.SaveAsTIFF(rasterizedPageID, filePath, false, tiffType);
oGdPictureImaging.ReleaseGdPictureImage(rasterizedPageID);
//check for page errors
if (status != GdPictureStatus.OK)
{
Console.WriteLine("page error: " + pdFilFullName + status.ToString());
}
Application.DoEvents();
}
}
protected void pdftotiff(string filepath){
List<string> result = Directory.EnumerateFiles(filepath, "*.pdf", System.IO.SearchOption.TopDirectoryOnly).Union(Directory.EnumerateFiles(filepath, "*.tif", System.IO.SearchOption.TopDirectoryOnly)).ToList();
foreach(string file in result){
GdPicturePDF oGdPicturePDF = new GdPicturePDF();
GdPictureImaging oGdPictureImaging = new GdPictureImaging();
if ((_pdFileInfo.Name.Split('.')[1] != "tif") && (oGdPicturePDF.LoadFromFile(_pdFileInfo.FullName, false) == GdPictureStatus.OK))
{
batchDir = folderPath + "\\Batches\\" + _pdFileInfo.Name.Split('.')[0] + "." + batchDate.Substring(6, 2) + batchDate.Substring(4, 2);
batchname = _pdFileInfo.Name.Split('.')[0] + "." + batchDate.Substring(6, 2) + batchDate.Substring(4, 2);
if (!Directory.Exists(batchDir)){
Directory.CreateDirectory(batchDir);
}
Thread t = new Thread(() => ThreadRun(_pdFileInfo.FullName, batchDir));
t.Start();
}
}
Can you provide suggestion/samples.
Used Parallel.ForEach to process the pdf files like below.
List<string> result = Directory.EnumerateFiles(filepath, "*.pdf", System.IO.SearchOption.TopDirectoryOnly).Union(Directory.EnumerateFiles(filepath, "*.tif", System.IO.SearchOption.TopDirectoryOnly)).ToList();
Parallel.ForEach(result, new ParallelOptions { MaxDegreeOfParallelism=3}, file =>
{
try
{
GdPictureStatus status = new GdPictureStatus();
GdPictureImaging oGdPictureImaging = new GdPictureImaging();
GdPicturePDF oGdPicturePDF = new GdPicturePDF();
status = oGdPicturePDF.LoadFromFile(file, false);
if (status == GdPictureStatus.OK)
{
string batchDate = filepath.Substring(filepath.LastIndexOf("\\") + 1);
string padding = String.Empty;
string filePath = string.Empty;
FileInfo _pdFileInfo = new FileInfo(file);
string batchDir = filepath + "\\Batches\\" + _pdFileInfo.Name.Split('.')[0] + "." + batchDate.Substring(6, 2) + batchDate.Substring(4, 2);
string batchname = _pdFileInfo.Name.Split('.')[0] + "." + batchDate.Substring(6, 2) + batchDate.Substring(4, 2);
if (!Directory.Exists(batchDir))
{
Directory.CreateDirectory(batchDir);
}
for (int i = 1; i <= oGdPicturePDF.GetPageCount(); i++)
{
//select page
oGdPicturePDF.SelectPage(i);
//render selected page to GdPictureImage identifier
int rasterizedPageID = oGdPicturePDF.RenderPageToGdPictureImageEx(200.0f, true);
if (i == 1 || i < 10)
{
padding = "00";
}
else if (i == 10 || i < 100)
{
padding = "0";
}
else
{
padding = string.Empty;
}
//Set Image file name
filePath = batchDir + "\\" + padding + i + ".tif";
// Converting to black and White
oGdPictureImaging.FxBlackNWhite(rasterizedPageID, BitonalReduction.Stucki);
// Converting to Single pixel
oGdPictureImaging.ConvertTo1BppAT(rasterizedPageID);
// Saving each page of the PDF file to single TIFF image
status = oGdPictureImaging.SaveAsTIFF(rasterizedPageID, filePath, false, tiffType);
oGdPictureImaging.ReleaseGdPictureImage(rasterizedPageID);
}
}
oGdPictureImaging.Dispose();
oGdPicturePDF.Dispose();
}
catch (Exception g)
{
throw new ApplicationException(g.Message + file);
return;
}
}
);
}

Reading a large CSV file and processing in C#. Any suggestions?

I have a large CSV file around 25G. I need to parse each line which has around 10 columns and do some processing and finally save it to a new file with parsed data.
I am using dictionary as my datastructure. To avoid the memory overflow I am writing the file after 500,000 records and clearing the dictionary.
Can anyone suggest whether is this good way of doing. If not, any other better way of doing this? Right now it is taking 30 mins to process 25G file.
Here is the code
private static void ReadData(string filename, FEnum fileType)
{
var resultData = new ResultsData
{
DataColumns = new List<string>(),
DataRows = new List<Dictionary<string, Results>>()
};
resultData.DataColumns.Add("count");
resultData.DataColumns.Add("userid");
Console.WriteLine("Start Processing : " + DateTime.Now);
const long processLimit = 100000;
//ProcessLimit : 500000, TimeElapsed : 30 Mins;
//ProcessLimit : 100000, TimeElaspsed - Overflow
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
Dictionary<string, Results> parsedData = new Dictionary<string, Results>();
FileStream fileStream = new FileStream(filename, FileMode.Open, FileAccess.Read);
using (StreamReader streamReader = new StreamReader(fileStream))
{
string charsRead = streamReader.ReadLine();
int count = 0;
long linesProcessed = 0;
while (!String.IsNullOrEmpty(charsRead))
{
string[] columns = charsRead.Split(',');
string eventsList = columns[0] + ";" + columns[1] + ";" + columns[2] + ";" + columns[3] + ";" +
columns[4] + ";" + columns[5] + ";" + columns[6] + ";" + columns[7];
if (parsedData.ContainsKey(columns[0]))
{
Results results = parsedData[columns[0]];
results.Count = results.Count + 1;
results.Conversion = results.Count;
results.EventList.Add(eventsList);
parsedData[columns[0]] = results;
}
else
{
Results results = new Results {
Count = 1, Hash_Person_Id = columns[0], Tag_Id = columns[1], Conversion = 1,
Campaign_Id = columns[2], Inventory_Placement = columns[3], Action_Id = columns[4],
Creative_Group_Id = columns[5], Creative_Id = columns[6], Record_Time = columns[7]
};
results.EventList = new List<string> {eventsList};
parsedData.Add(columns[0], results);
}
charsRead = streamReader.ReadLine();
linesProcessed++;
if (linesProcessed == processLimit)
{
linesProcessed = 0;
SaveParsedValues(filename, fileType, parsedData);
//Clear Dictionary
parsedData.Clear();
}
}
}
stopwatch.Stop();
Console.WriteLine(#"File : {0} Batch Limit : {1} Time elapsed : {2} ", filename + Environment.NewLine, processLimit + Environment.NewLine, stopwatch.Elapsed + Environment.NewLine);
}
Thank you
The Microsoft.VisualBasic.FileIO.TextFieldParser class looks like it could do the job. Try it, it may speed things up.

How to convert a HTML File to PDF using WkHTMLToSharp / wkhtmltopdf with Images in C#

I am generating HTML files on the fly, and I would like to create a PDF from the final file. I am using the following to generate the HTML file:
public static void WriteHTML(string cFile, List<Movie> mList)
{
int lineID = 0;
string strHeader, strMovie, strGenre, tmpGenre = null;
string strPDF = null;
// initiates streamwriter for catalog output file
FileStream fs = new FileStream(cFile, FileMode.Create);
StreamWriter catalog = new StreamWriter(fs);
strHeader = "<style type=\"text/css\">\r\n" + "<!--\r\n" + "tr#odd {\r\n" + " background-color:#e2e2e2;\r\n" + " vertical-align:top;\r\n" + "}\r\n" + "\r\n" + "tr#even {\r\n" + " vertical-align:top;\r\n" + "}\r\n" + "div#title {\r\n" + " font-size:16px;\r\n" + " font-weight:bold;\r\n" + "}\r\n" + "\r\n" + "div#mpaa {\r\n" + " font-size:10px;\r\n" + "}\r\n" + "\r\n" + "div#genre {\r\n" + " font-size:12px;\r\n" + " font-style:italic;\r\n" + "}\r\n" + "\r\n" + "div#plot {\r\n" + " height: 63px;\r\n" + " font-size:12px;\r\n" + " overflow:hidden;\r\n" + "}\r\n" + "-->\r\n" + "</style>\r\n" + "\r\n" + "<html>\r\n" + " <body>\r\n" + " <table>\r\n";
catalog.WriteLine(strHeader);
strPDF = strHeader;
foreach (Movie m in mList)
{
tmpGenre = null;
strMovie = lineID == 0 ? " <tr id=\"odd\" style=\"page-break-inside:avoid\">\r\n" : " <tr id=\"even\" style=\"page-break-inside:avoid\">\r\n";
catalog.WriteLine(strMovie);
strPDF += strMovie;
foreach (string genre in m.Genres)
tmpGenre += ", " + genre + "";
strGenre = tmpGenre != null ? tmpGenre.Substring(2) : null;
strMovie = " <td>\r\n" + " <img src=\".\\images\\" + m.ImageFile + "\" width=\"75\" height=\"110\">\r\n" + " </td>\r\n" + " <td>\r\n" + " <div id=\"title\">" + m.Title + "</div>\r\n" + " <div id=\"mpaa\">" + m.Certification + " " + m.MPAA + "</div>\r\n" + " <div id=\"genre\">" + strGenre + "</div>\r\n" + " <div id=\"plot\">" + m.Plot + "</div>\r\n" + " </td>\r\n" + " </tr>\r\n";
catalog.WriteLine(strMovie);
strPDF += strMovie;
lineID = lineID == 0 ? 1 : 0;
}
string closingHTML = " </table>\r\n" + " </body>\r\n" + "</html>";
catalog.WriteLine(closingHTML);
strPDF += closingHTML;
WritePDF(strPDF, cFile + ".PDF");
catalog.Close();
}
Once completed, I want to call the following function to generate the PDF file:
public static void WritePDF(string cFile, string pdfFile)
{
WkHtmlToPdfConverter w = new WkHtmlToPdfConverter();
byte[] strHTML = w.Convert(cFile);
File.WriteAllBytes(pdfFile, strHTML);
w.Dispose();
}
I've discovered that the .Convert function will convert HTML code to PDF, not a file. Secondly, when I pass in the HTML code directly, the images are not appearing in the PDF. I know there is an issue with .GIF files, but these are all .JPG files.
I've read a lot about how good wkhtmltopdf is, and the guy who wrote WkHTMLToSharp posted his project all over SO, but I've been disappointed by the lack of documentation for it.
I WANT to be able to pass in a file to convert, change the margins (I know this is possible, I just need to figure out the correct settings), have it convert images correctly, and most importantly, to not break up my items across multiple pages (support "page-break-inside:avoid" or something similar).
I'd love to see how others are using this!
I have coded an example about how to create a PDF from HTML. I just updated it to also print images.
https://github.com/hmadrigal/playground-dotnet/tree/master/MsDotNet.PdfGeneration
(In my blog post I explain most of the project https://hmadrigal.wordpress.com/2015/10/16/creating-pdf-reports-from-html-using-dotliquid-markup-for-templates-and-wkhtmltoxsharp-for-printing-pdf/ )
Pretty much you have two options:
1: Using file:// and the fullpath to the file.
<img alt="profile" src="{{ employee.PorfileFileName | Prepend: "Assets\ProfileImage\" | ToLocalPath }}" />
2: Using URL Data (https://en.wikipedia.org/wiki/Data_URI_scheme)
<img alt="profile" src="data:image/png;base64,{{ employee.PorfileFileName | Prepend: "Assets\ProfileImage\" | ToLocalPath | ToBase64 }}" />
Cheers,
Herb
Use WkHtmlToXSharp.
Download the latest DLL from Github
public static string ConvertHTMLtoPDF(string htmlFullPath, string pageSize, string orientation)
{
string pdfUrl = htmlFullPath.Replace(".html", ".pdf");
try
{
#region USING WkHtmlToXSharp.dll
//IHtmlToPdfConverter converter = new WkHtmlToPdfConverter();
IHtmlToPdfConverter converter = new MultiplexingConverter();
converter.GlobalSettings.Margin.Top = "0cm";
converter.GlobalSettings.Margin.Bottom = "0cm";
converter.GlobalSettings.Margin.Left = "0cm";
converter.GlobalSettings.Margin.Right = "0cm";
converter.GlobalSettings.Orientation = (PdfOrientation)Enum.Parse(typeof(PdfOrientation), orientation);
if (!string.IsNullOrEmpty(pageSize))
converter.GlobalSettings.Size.PageSize = (PdfPageSize)Enum.Parse(typeof(PdfPageSize), pageSize);
converter.ObjectSettings.Page = htmlFullPath;
converter.ObjectSettings.Web.EnablePlugins = true;
converter.ObjectSettings.Web.EnableJavascript = true;
converter.ObjectSettings.Web.Background = true;
converter.ObjectSettings.Web.LoadImages = true;
converter.ObjectSettings.Load.LoadErrorHandling = LoadErrorHandlingType.ignore;
Byte[] bufferPDF = converter.Convert();
System.IO.File.WriteAllBytes(pdfUrl, bufferPDF);
converter.Dispose();
#endregion
}
catch (Exception ex)
{
throw new Exception(ex.Message, ex);
}
return pdfUrl;
}
You can use Spire.Pdf to do so.
This component could convert html to pdf.
PdfDocument pdfdoc = new PdfDocument();
pdfdoc.LoadFromHTML(fileFullName, true, true, true);
//String url = "http://www.e-iceblue.com/";
//pdfdoc.LoadFromHTML(url, false, true, true);
pdfdoc.SaveToFile("FromHTML.pdf");
We're also using wkhtmltopdf and are able to render images correctly. However, by default the rendering of images is disabled.
You have to specify those options on your converter instance:
var wk = _GetConverter()
wk.GlobalSettings.Margin.Top = "20mm";
wk.GlobalSettings.Margin.Bottom = "10mm";
wk.GlobalSettings.Margin.Left = "10mm";
wk.GlobalSettings.Margin.Right = "10mm";
wk.GlobalSettings.Size.PaperSize = PdfPaperSize.A4;
wk.ObjectSettings.Web.PrintMediaType = true;
wk.ObjectSettings.Web.LoadImages = true;
wk.ObjectSettings.Web.EnablePlugins = false;
wk.ObjectSettings.Web.EnableJavascript = true;
result = wk.Convert(htmlContent);

Categories

Resources