Error while parsing Tinymce Editor string - c#

I am parsing the HTML with the help of following code. The html comes from tinymce editor. I have just shortened my code. There can be any number of images in the EmailBody string as it is selected by the user in tinymce editor.
All works good except when there is <img src=""> tag in the Email Body.
I get the error on this line htmlWorker.Parse(sr);
string EmailBody = #"<p><img src=""http://weknowyourdreams.com/images/smile/smile-07.jpg""></p>";
using (var ms = new MemoryStream())
{
//Create an iTextSharp Document which is an abstraction of a PDF but **NOT** a PDF
using (var doc = new Document())
{
//Create a writer that's bound to our PDF abstraction and our stream
using (var writer = PdfWriter.GetInstance(doc, ms))
{
//Open the document for writing
doc.Open();
using (var htmlWorker = new iTextSharp.text.html.simpleparser.HTMLWorker(doc))
{
//HTMLWorker doesn't read a string directly but instead needs a TextReader (which StringReader subclasses)
using (var sr = new StringReader(EmailBody))
{
//Parse the HTML
htmlWorker.Parse(sr);
}
}
doc.Close();
}
}
bytes = ms.ToArray();
}
Gives me this error:
Cannot access a closed Stream
How to fix this error?

Related

itext7 CopyPagesTo not opening PDF

I am trying to add a Cover Page PDF file to another PDF file. I am using CopyPagesTo method. CoverPageFilePath will go before any pages in the pdfDocumentFile. I then need to rewrite that new file to the same location. When I run the code and open the new pdf file I get an error about it being damaged.
public static void iText7MergePDF()
{
byte[] modifiedPdfInBytes = null;
string pdfCoverPageFilePath = #"PathtoCoverPage\Cover Page.pdf";
PdfDocument pdfDocumentCover = new PdfDocument(new iText.Kernel.Pdf.PdfReader(pdfCoverPageFilePath));
string pdfDocumentFile =#"PathtoFullDocument.pdf";
var buffer = File.ReadAllBytes(pdfDocumentFile);
using (var originalPdfStream = new MemoryStream(buffer))
using (var modifiedPdfStream = new MemoryStream())
{
var pdfReader = new iText.Kernel.Pdf.PdfReader(originalPdfStream);
var pdfDocument = new PdfDocument(pdfReader, new PdfWriter(modifiedPdfStream));
int numberOfPages = pdfDocumentCover.GetNumberOfPages();
pdfDocumentCover.CopyPagesTo(1, numberOfPages, pdfDocument);
modifiedPdfInBytes = modifiedPdfStream.ToArray();
pdfDocument.Close();
}
System.IO.File.WriteAllBytes(pdfGL, modifiedPdfInBytes);
}
Whenever you have some other type, like a StreamWriter, or here a PdfWriter writing to a Stream, it may not write all the data to the Stream immediately.
Here you Close the pdfDocument for all the data to be written to the MemoryStream.
ie this
modifiedPdfInBytes = modifiedPdfStream.ToArray();
pdfDocument.Close();
Should be
pdfDocument.Close();
modifiedPdfInBytes = modifiedPdfStream.ToArray();

itext7: InvalidCastException: Unable to cast object of type iText.Html2pdf.Attach.Impl.Layout.HtmlPageBreak to type iText.Layout.Element.IBlockElement

Using itext7 (7.2.0) AND itext7.pdfhtml (4.0.0) AND .Net Core 5.0
Converting itext5 report to itext7
Getting the error, when forcing a page break using html style 'page-break-before: always;'
public FileResult PrintHtmlToPDFPageBreak()
{
StringBuilder sbBody = new StringBuilder();
sbBody.Append("<html>");
sbBody.Append("<body>");
sbBody.Append("<p>This is first page</p>");
sbBody.Append("<div style='page-break-before: always;'></div>");
sbBody.Append("<p>This is second page</p>");
sbBody.Append("</body>");
sbBody.Append("</html>");
string htmlContent = sbBody.ToString();
bool isPortrait = true;
string reportTitle = "Testing iText7 in .Net5";
//generate the byte array for the Pdf
byte[] pdfContent = null;
//Create a System.IO.MemoryStream object
using (MemoryStream memoryStream = new MemoryStream())
{
//Initialize PDF writer
PdfWriter pdfWriter = new PdfWriter(memoryStream);
//Initialize PDF document
PdfDocument pdfDocument = new PdfDocument(pdfWriter);
//Initialize document
Document document = (isPortrait ? new Document(pdfDocument, PageSize.LETTER) : new Document(pdfDocument, PageSize.LETTER.Rotate()));
var headerHeight = String.IsNullOrEmpty(reportTitle) ? 70f : 120f;
document.SetMargins(headerHeight, 10f, 56f, 10f); //top, right, bottom, left
#region HTML to PDF
//Convert to Elements
ConverterProperties converterProperties = new ConverterProperties();
IList<IElement> elements = HtmlConverter.ConvertToElements(htmlContent, converterProperties);
foreach (var element in elements)
document.Add((IBlockElement)element);
#endregion
//Close the Document
document.Close();
pdfContent = memoryStream.ToArray();
//Close the MemoryStream
memoryStream.Close();
}
//return the byte array in the form of FileContentResult for browser download
var fileName = "ConvertHtmlToPDF.pdf";
return File(pdfContent, System.Net.Mime.MediaTypeNames.Application.Pdf, fileName);
}
You haven't attached the stacktrace but I am pretty sure the problem is in these two lines:
foreach (var element in elements)
document.Add((IBlockElement)element);
You are just casting without checking whether the cast is legitimate. Just process the case when an element is an instance of AreaBreak.
Thank you Alexey your solution worked.
Old code:
foreach (var element in elements)
document.Add((IBlockElement)element);
New code:
foreach (var element in elements)
{
if (element.GetType().Name == "HtmlPageBreak")
document.Add(new AreaBreak(AreaBreakType.NEXT_PAGE));
else
document.Add((IBlockElement)element);
}

Generate one pdf document with multiple pages converting from html using IText 7

I'm working with IText 7, I've been able to get one html page and generate a pdf for that page, but I need to generate one pdf document from multiple html pages and separated by pages. For example: I have Page1.html, Page2.html and Page3.html. I will need a pdf document with 3 pages, the first page with the content of Page1.html, second page with the content of Page2.html and like that...
This is the code I have and it's working for one html page:
ConverterProperties properties = new ConverterProperties();
PdfWriter writer = new PdfWriter(pdfRoot, new WriterProperties().SetFullCompressionMode(true));
PdfDocument pdfDocument = new PdfDocument(writer);
pdfDocument.AddEventHandler(PdfDocumentEvent.END_PAGE, new HeaderPdfEventHandler());
HtmlConverter.ConvertToPdf(htmlContent, pdfDocument, properties);
Is it possible to loop against the multiple html pages, add a new page to the PdfDocument for every html page and then have only one pdf generated with one page per html page?
UPDATE
I've been following this example and trying to translate it from Java to C#, I'm trying to use PdfMerger and loop around the html pages... but I'm receiving the Exception Cannot access a closed stream, on this line:
temp = new PdfDocument(
new PdfReader(new RandomAccessSourceFactory().CreateSource(baos), rp));
It looks like is related to the ByteArrayOutputStream baos instance. Any suggestions? This is my current code:
foreach (var html in htmlList)
{
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfDocument temp = new PdfDocument(new PdfWriter(baos));
HtmlConverter.ConvertToPdf(html, temp, properties);
ReaderProperties rp = new ReaderProperties();
temp = new PdfDocument(
new PdfReader(new RandomAccessSourceFactory().CreateSource(baos), rp));
merger.Merge(temp, 1, temp.GetNumberOfPages());
temp.Close();
}
pdfDocument.Close();
You are using RandomAccessSourceFactory and passing there a closed stream which you wrote a PDF document into. RandomAccessSourceFactory expects an input stream instead that is ready to be read.
First of all you should use MemoryStream which is native to .NET world. ByteArrayOutputStream is the class that was ported from Java for internal purposes (although it extends MemoryStream as well). Secondly, you don't have to use RandomAccessSourceFactory - there is a simpler way.
You can create a new MemoryStream instance from the bytes of the MemoryStream that you used to create a temporary PDF with the following line:
baos = new MemoryStream(baos.ToArray());
As an additional remark, it's better to close PdfMerger instance directly instead of closing the document - closing PdfMerger closes the underlying document as well.
All in all, we get the following code that works:
foreach (var html in htmlList)
{
MemoryStream baos = new MemoryStream();
PdfDocument temp = new PdfDocument(new PdfWriter(baos));
HtmlConverter.ConvertToPdf(html, temp, properties);
ReaderProperties rp = new ReaderProperties();
baos = new MemoryStream(baos.ToArray());
temp = new PdfDocument(new PdfReader(baos, rp));
pdfMerger.Merge(temp, 1, temp.GetNumberOfPages());
temp.Close();
}
pdfMerger.Close();
Maybe not so succinctly. I use "using". Similar answer
private byte[] CreatePDF(string html)
{
byte[] binData;
using (var workStream = new MemoryStream())
{
using (var pdfWriter = new PdfWriter(workStream))
{
//Create one pdf document
using (var pdfDoc = new PdfDocument(pdfWriter))
{
pdfDoc.SetDefaultPageSize(iText.Kernel.Geom.PageSize.A4.Rotate());
//Create one pdf merger
var pdfMerger = new PdfMerger(pdfDoc);
//Create two identical pdfs
for (int i = 0; i < 2; i++)
{
using (var newStream = new MemoryStream(CreateDocument(html)))
{
ReaderProperties rp = new ReaderProperties();
using (var newPdf = new PdfDocument(new PdfReader(newStream, rp)))
{
pdfMerger.Merge(newPdf, 1, newPdf.GetNumberOfPages());
}
}
}
}
binData = workStream.ToArray();
}
}
return binData;
}
Create pdf
private byte[] CreateDocument(string html)
{
byte[] binData;
using (var workStream = new MemoryStream())
{
using (var pdfWriter = new PdfWriter(workStream))
{
using (var pdfDoc = new PdfDocument(pdfWriter))
{
pdfDoc.SetDefaultPageSize(iText.Kernel.Geom.PageSize.A4.Rotate());
ConverterProperties props = new ConverterProperties();
using (var document = HtmlConverter.ConvertToDocument(html, pdfDoc, props))
{
}
}
binData = workStream.ToArray();
}
}
return binData;
}

Search And Replace Text in OPENXML (Added file)

I know there is alot of posts on it, BUT nothing worked for my problem:
Im using OPENxml to create word document, and I am adding some ready files to the document during the creation. I want to change some text in the file that I am adding after the document is ready. So thats what I tried:
First creating the document:
fileName = HttpContext.Current.Server.MapPath("~/reports/"+fileName+".docx");
using (var doc = WordprocessingDocument.Create(
fileName, WordprocessingDocumentType.Document))
{
///add files and content inside the document
addContentFile("template1part1", HttpContext.Current.Server.MapPath("~/templates/template1part1.docx"), mainPart);
}
this is how I am adding the files:
private static void addContentFile(string id,string path, MainDocumentPart mainPart){
string altChunkId = id;
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.WordprocessingML, altChunkId);
using (FileStream fileStream = File.Open(path, FileMode.Open))
{
chunk.FeedData(fileStream);
fileStream.Close();
}
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainPart.Document.Body.Append(altChunk);
mainPart.Document.Save();
}
And this is how I am trying to replace text AFTER I created the file (after i finished to use WordprocessingDocument)
First try:
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
docText = sr.ReadToEnd();
docText = new Regex(findText, RegexOptions.IgnoreCase).Replace(docText, replaceText);
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
sw.Write(docText);
}
Second try:
using ( WordprocessingDocument doc =
WordprocessingDocument.Open(#"yourpath\testdocument.docx", true))
{
var body = doc.MainDocumentPart.Document.Body;
var paras = body.Elements<Paragraph>();
foreach (var para in paras)
{
foreach (var run in para.Elements<Run>())
{
foreach (var text in run.Elements<Text>())
{
if (text.Text.Contains("text-to-replace"))
{
text.Text = text.Text.Replace("text-to-replace", "replaced-text");
}
}
}
}
}
}
None of them worked, and I tried much more.
Its worked for text that I am manually add to the document, but its now working for text that I am adding from the ready files.
there is a way to do it?
The way you are adding the files are using altchuncks. But you are trying to replace things as if you are modifying the resulting document's openxml.
When you merge documents as altchuncks you are basically adding them as embedded external files to the original document but not as openxml markup. Which means you cannot treat the additional attached documents as openxml documents.
If you want to achieve what you are trying, you have to merge the documents as explained in my answer here - https://stackoverflow.com/a/18352219/860243 which makes the resulting document a proper openxml document. Which allows you to modify it later as you wish.

Editing custom XML part in word document sometimes corrupts document

We have a system that stores some custom templating data in a Word document. Sometimes, updating this data causes Word to complain that the document is corrupted. When that happens, if I unzip the docx file and compare the contents to the previous version, the only difference appears to be the expected change in the customXML\item.xml file. If I re-zip the contents using 7zip, it seems to work OK (Word no longer complains that the document is corrupt).
The (simplified) code:
void CreateOrReplaceCustomXml(string filename, MyCustomData data)
{
using (var doc = WordProcessingDocument.Open(filename, true))
{
var part = GetCustomXmlParts(doc).SingleOrDefault();
if (part == null)
{
part = doc.MainDocumentPart.AddCustomXmlPart(CustomXmlPartType.CustomXml);
}
var serializer = new DataContractSerializer(typeof(MyCustomData));
using (var stream = new MemoryStream())
{
serializer.WriteObject(stream, data);
stream.Seek(0, SeekOrigin.Begin);
part.FeedData(stream);
}
}
}
IEnumerable<CustomXmlPart> GetCustomXmlParts(WordProcessingDocument doc)
{
return doc.MainDocumentPart.CustomXmlParts
.Where(part =>
{
using (var stream = doc.Package.GePart(c.Uri).GetStream())
using (var streamReader = new StreamReader(stream))
{
return streamReader.ReadToEnd().Contains("Some.Namespace");
}
});
}
Any suggestions?
Since re-zipping works, it seems the content is well-formed.
So it sounds like the zip process is at fault. So open the corrupted docx in 7-Zip, and take note of the values in the "method" column (especially for customXML\item.xml).
Compare that value to a working docx - is it the same or different? Method "Deflate" works.
I faced the same issue and it turned out it was due to encoding.
Do you already specify the same encoding when serializing/deserializing?
Couple of suggestion
a. Try doc.Package.Flush(); after you write the data back into the custom xml.
b. You may have to delete all custom part and add a new custom part. We are using the following code and it seems working fine.
public static void ReplaceCustomXML(WordprocessingDocument myDoc, string customXML)
{
MainDocumentPart mainPart = myDoc.MainDocumentPart;
mainPart.DeleteParts<CustomXmlPart>(mainPart.CustomXmlParts);
CustomXmlPart customXmlPart = mainPart.AddCustomXmlPart(CustomXmlPartType.CustomXml);
using (StreamWriter ts = new StreamWriter(customXmlPart.GetStream()))
{
ts.Write(customXML);
ts.Flush();
ts.Close();
}
}
public static MemoryStream GetCustomXmlPart(MainDocumentPart mainPart)
{
foreach (CustomXmlPart part in mainPart.CustomXmlParts)
{
using (XmlTextReader reader =
new XmlTextReader(part.GetStream(FileMode.Open, FileAccess.Read)))
{
reader.MoveToContent();
if (reader.Name.Equals("aaaa", StringComparison.OrdinalIgnoreCase))
{
string str = reader.ReadOuterXml();
byte[] byteArray = Encoding.ASCII.GetBytes(str);
MemoryStream stream = new MemoryStream(byteArray);
return stream;
}
}
}
return null; //result;
}
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(ms, true))
{
StreamReader reader = new StreamReader(memStream);
string FullXML = reader.ReadToEnd();
ReplaceCustomXML(myDoc, FullXML);
myDoc.Package.Flush();
//Code to save file
}

Categories

Resources