iTextSharp Can not Convert All HTML to PDF - c#

Using the sample codes from here I come up with these codes -
var my_html = this.GetmyReportHtml();
var my_css = this.GetmyReportHtmlCss();
Byte[] bytes;
using (var ms = new MemoryStream())
{
using (var doc = new iTextSharp.text.Document(PageSize.LETTER))
{
using (var writer = PdfWriter.GetInstance(doc, ms))
{
doc.Open();
try
{
using (var msCss = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(my_css)))
{
using (var msHtml = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(my_html)))
{
iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, msHtml, msCss);
}
}
}
catch (Exception ex)
{
}
finally
{
doc.Close();
}
}
}
bytes = ms.ToArray();
}
System.IO.File.WriteAllBytes(#"c:\\temp\test.pdf", bytes);
PDF has been generated. However my_html has 6 pages in my case, only half of the content are converted to pdf.
Anyone knows what happened here? How to know if iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml works properly?
Thanks

Found out the problem. In this line of codes -
using (var doc = new iTextSharp.text.Document(PageSize.LETTER))
PageSize.LETTER can not be passed in.

Related

Merging uploaded images from mysql table into single pdf file in c#

I have uploaded documents (only image format) to mysql database through c#. This entry in the database table will have reference Id for each transaction. Upon clicking on download button, all uploaded images of respective table id should merge into a single pdf file. How can I Achieve this?. Require C# code to download
Below is the sample code which is not working. It returns system.byte[] as response
[HttpPost]
public static byte[] MergeFile()
{
PdfReader.unethicalreading = true;
using (iTextSharp.text.Document doc = new iTextSharp.text.Document())
{
doc.SetPageSize(PageSize.A4);
using (var ms = new MemoryStream())
{
// PdfWriter wri = PdfWriter.GetInstance(doc, ms);
using (PdfCopy pdf = new `enter code here`PdfCopy(doc, ms))
{
doc.Open();
using (MySqlConnection connection = new MySqlConnection(ConfigurationManager.ConnectionStrings[2].ConnectionString))
{
using (MyDBContext context = new MyDBContext(connection, false))
{
try
{
var getDocs = context.Document.Where(p => p.ReferenceId == 1).ToList();
foreach (DAL.Document info in getDocs)
{
try
{
doc.NewPage();
iTextSharp.text.Document imageDocument = null;
PdfWriter imageDocumentWriter = null;
switch (info.Extension.Trim('.').ToLower())
{
case "image/png":
using (imageDocument = new iTextSharp.text.Document())
{
using (var imageMS = new MemoryStream())
{
using (imageDocumentWriter = PdfWriter.GetInstance(imageDocument, imageMS))
{
imageDocument.Open();
if (imageDocument.NewPage())
{
var image = iTextSharp.text.Image.GetInstance(info.Documents);
image.Alignment = Element.ALIGN_CENTER;
image.ScaleToFit(doc.PageSize.Width - 10, doc.PageSize.Height - 10);
if (!imageDocument.Add(image))
{
throw new Exception("Unable to add image to page!");
}
imageDocument.Close();
imageDocumentWriter.Close();
using (PdfReader imageDocumentReader = new PdfReader(imageMS.ToArray()))
{
var page = pdf.GetImportedPage(imageDocumentReader, 1);
pdf.AddPage(page);
imageDocumentReader.Close();
}
}
}
}
}
break;
}
catch (Exception e)
{
e.Data["FileName"]=info.DocumentName;
throw e;
}
}
}
catch (Exception e)
{
var k = e.Message;
}
}
}
}
if (doc.IsOpen()) doc.Close();
return ms.ToArray();
}
}
}

How to display Unicode characters in iTextSharo by using C# on MVC

I want convert HTML to PDF by using iTextSharp, work success for English characters but when using Unicode characters PDF convert but dose not display Unicode characters
This answer not help me
Thanks for help :)
My Code
public ActionResult Export()
{
Byte[] bytes;
using (var ms = new MemoryStream())
{
using (var doc = new Document())
{
using (var writer = PdfWriter.GetInstance(doc, ms))
{
doc.Open();
var example_html = #"<p> سڵاو</p>";
var example_css = #".headline{font-size:200%}";
using (var htmlWorker = new iTextSharp.text.html.simpleparser.HTMLWorker(doc))
{
using (var sr = new StringReader(example_html))
{
htmlWorker.Parse(sr);
}
}
using (var srHtml = new StringReader(example_html))
{
iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, srHtml);
}
using (var msCss = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(example_css)))
{
using (var msHtml = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(example_html)))
{
iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, msHtml, msCss, System.Text.Encoding.UTF8);
}
}
doc.Close();
}
}
bytes = ms.ToArray();
}
return File(bytes, "application/pdf", "test.pdf");
}
Alternative a Way
I try use Rotativa it's easy to use, and it follow the same architecture of MVC
PM > Install-Package Rotativa.MVC -Version 2.0.3
USE
try
{
return new ActionAsPdf("Contact"); // this view or html for convert PDF
}
catch (Exception e)
{
return RedirectToAction("GeneralError", "Error");
}

itextsharp "the document has no pages" error when i have anchor tag

I am converting some html to pdf. It is working fine but when i have anchor tag in my html i get error the document has no pages
My code is
byte[] data;
using (var sr = new StringReader(sw.ToString()))
{
var st = new StyleSheet();
GetStyleSheetForUnicodeCharacters(st);
using (var ms = new MemoryStream())
{
using (var pdfDoc = new Document())
{
using (var w = PdfWriter.GetInstance(pdfDoc, ms))
{
pdfDoc.Open();
var parsedHtmlElements = HTMLWorker.ParseToList(sr, st);
foreach (var htmlElement in parsedHtmlElements)
{
pdfDoc.Add(htmlElement as IElement);
}
pdfDoc.Close();
data = ms.ToArray();
}
}
}
}
The problem may be invalid html. One way to check is to run your html source through a validator such W3C Markup Validation Service.
have you already tried to add a Page with:
pdfDoc.NewPage();
I think your Code should look like this:
byte[] data;
using (var sr = new StringReader(sw.ToString()))
{
var st = new StyleSheet();
GetStyleSheetForUnicodeCharacters(st);
using (var ms = new MemoryStream())
{
using (var pdfDoc = new Document())
{
using (var w = PdfWriter.GetInstance(pdfDoc, ms))
{
pdfDoc.Open();
pdfDoc.NewPage(); // add Page here
var parsedHtmlElements = HTMLWorker.ParseToList(sr, st);
foreach (var htmlElement in parsedHtmlElements)
{
pdfDoc.Add(htmlElement as IElement);
}
pdfDoc.Close();
data = ms.ToArray();
}
}
}
}
You can also add a blank Page by using:
pdfDoc.newPage();
w.setPageEmpty(false);
MfG chris
Need to check that any html tags are mismatched. Example /td>, this types of mistake raised above error.

Editing custom XML part in word document sometimes corrupts document

We have a system that stores some custom templating data in a Word document. Sometimes, updating this data causes Word to complain that the document is corrupted. When that happens, if I unzip the docx file and compare the contents to the previous version, the only difference appears to be the expected change in the customXML\item.xml file. If I re-zip the contents using 7zip, it seems to work OK (Word no longer complains that the document is corrupt).
The (simplified) code:
void CreateOrReplaceCustomXml(string filename, MyCustomData data)
{
using (var doc = WordProcessingDocument.Open(filename, true))
{
var part = GetCustomXmlParts(doc).SingleOrDefault();
if (part == null)
{
part = doc.MainDocumentPart.AddCustomXmlPart(CustomXmlPartType.CustomXml);
}
var serializer = new DataContractSerializer(typeof(MyCustomData));
using (var stream = new MemoryStream())
{
serializer.WriteObject(stream, data);
stream.Seek(0, SeekOrigin.Begin);
part.FeedData(stream);
}
}
}
IEnumerable<CustomXmlPart> GetCustomXmlParts(WordProcessingDocument doc)
{
return doc.MainDocumentPart.CustomXmlParts
.Where(part =>
{
using (var stream = doc.Package.GePart(c.Uri).GetStream())
using (var streamReader = new StreamReader(stream))
{
return streamReader.ReadToEnd().Contains("Some.Namespace");
}
});
}
Any suggestions?
Since re-zipping works, it seems the content is well-formed.
So it sounds like the zip process is at fault. So open the corrupted docx in 7-Zip, and take note of the values in the "method" column (especially for customXML\item.xml).
Compare that value to a working docx - is it the same or different? Method "Deflate" works.
I faced the same issue and it turned out it was due to encoding.
Do you already specify the same encoding when serializing/deserializing?
Couple of suggestion
a. Try doc.Package.Flush(); after you write the data back into the custom xml.
b. You may have to delete all custom part and add a new custom part. We are using the following code and it seems working fine.
public static void ReplaceCustomXML(WordprocessingDocument myDoc, string customXML)
{
MainDocumentPart mainPart = myDoc.MainDocumentPart;
mainPart.DeleteParts<CustomXmlPart>(mainPart.CustomXmlParts);
CustomXmlPart customXmlPart = mainPart.AddCustomXmlPart(CustomXmlPartType.CustomXml);
using (StreamWriter ts = new StreamWriter(customXmlPart.GetStream()))
{
ts.Write(customXML);
ts.Flush();
ts.Close();
}
}
public static MemoryStream GetCustomXmlPart(MainDocumentPart mainPart)
{
foreach (CustomXmlPart part in mainPart.CustomXmlParts)
{
using (XmlTextReader reader =
new XmlTextReader(part.GetStream(FileMode.Open, FileAccess.Read)))
{
reader.MoveToContent();
if (reader.Name.Equals("aaaa", StringComparison.OrdinalIgnoreCase))
{
string str = reader.ReadOuterXml();
byte[] byteArray = Encoding.ASCII.GetBytes(str);
MemoryStream stream = new MemoryStream(byteArray);
return stream;
}
}
}
return null; //result;
}
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(ms, true))
{
StreamReader reader = new StreamReader(memStream);
string FullXML = reader.ReadToEnd();
ReplaceCustomXML(myDoc, FullXML);
myDoc.Package.Flush();
//Code to save file
}

iTextSharp is producing a corrupt PDF

The code snippet below returns a corrupt PDF document however if I return mergedDocument instead it always returns a valid PDF. mergedDocument is based on a PDF file i created using Word, whereas completed document is entirely programmatically generated. The code "works" in that it throws no exceptions. Why is iTextSharp creating a corrupt PDF?
byte[] completedDocument = null;
using (MemoryStream streamCompleted = new MemoryStream())
{
using (Document document = new Document())
{
PdfCopy copy = new PdfCopy(document, streamCompleted);
document.Open();
copy.Open();
foreach (var item in eventItems)
{
byte[] mergedDocument = null;
PdfReader reader = new PdfReader(pdfTemplates[item.DataTokens[NotifyTokenType.OrganisationID]]);
using (MemoryStream streamTemplate = new MemoryStream())
{
using (PdfStamper stamper = new PdfStamper(reader, streamTemplate))
{
foreach (var token in item.DataTokens)
{
if (stamper.AcroFields.Fields.Any(fld => fld.Key == token.Key.ToString()))
{
stamper.AcroFields.SetField(token.Key.ToString(), token.Value);
}
}
stamper.FormFlattening = true;
stamper.Writer.CloseStream = false;
}
mergedDocument = new byte[streamTemplate.Length];
streamTemplate.Position = 0;
streamTemplate.Read(mergedDocument, 0, (int)streamTemplate.Length);
}
reader = new PdfReader(mergedDocument);
for (int i = 1; i <= reader.NumberOfPages; i++)
{
document.SetPageSize(PageSize.A4);
copy.AddPage(copy.GetImportedPage(reader, i));
}
}
completedDocument = new byte[streamCompleted.Length];
streamCompleted.Position = 0;
streamCompleted.Read(completedDocument, 0, (int)streamCompleted.Length);
}
}
return completedDocument;
You need to close the document and copy objects to flush the PDF writing buffer. This, however, causes some problems when trying to read the stream into an array. The fix for that is to use the ToArray() method of the MemoryStream which still works on closed streams. The changes I made have comments on them.
byte[] completedDocument = null;
using (MemoryStream streamCompleted = new MemoryStream())
{
using (Document document = new Document())
{
PdfCopy copy = new PdfCopy(document, streamCompleted);
document.Open();
copy.Open();
foreach (var item in eventItems)
{
byte[] mergedDocument = null;
PdfReader reader = new PdfReader(pdfTemplates[item.DataTokens[NotifyTokenType.OrganisationID]]);
using (MemoryStream streamTemplate = new MemoryStream())
{
using (PdfStamper stamper = new PdfStamper(reader, streamTemplate))
{
foreach (var token in item.DataTokens)
{
if (stamper.AcroFields.Fields.Any(fld => fld.Key == token.Key.ToString()))
{
stamper.AcroFields.SetField(token.Key.ToString(), token.Value);
}
}
stamper.FormFlattening = true;
stamper.Writer.CloseStream = false;
}
//Copy the stream's bytes
mergedDocument = streamTemplate.ToArray();
}
reader = new PdfReader(mergedDocument);
for (int i = 1; i <= reader.NumberOfPages; i++)
{
document.SetPageSize(PageSize.A4);
copy.AddPage(copy.GetImportedPage(reader, i));
}
//Close the document and the copy
document.Close();
copy.Close();
}
//ToArray() can operate on closed streams
completedDocument = streamCompleted.ToArray();
}
}
return completedDocument;
Also make sure your html doesn't contains hr tag while converting html to pdf
hdnEditorText.Value.Replace("\"", "'").Replace("<hr />", "").Replace("<hr/>", "")

Categories

Resources