Convert a PDF to PDF/A using IText 7?

Convert a PDF to PDF/A using IText 7? - c#

I need to convert a PDF file as a stream (1.7) to PDFA/3. The original file has the fonts and images already embedded.
So far I did:
var output = new MemoryStream();
var reader = new PdfReader(stream);
var writer = new PdfWriter(output);
Stream s = new FileStream("sRGB Color Space Profile.icm", FileMode.Open, FileAccess.Read);
var intent = new PdfOutputIntent("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", s);
var pdfA = new PdfADocument(writer, PdfAConformanceLevel.PDF_A_3A, intent);
//Setting some required parameters
pdfA.SetTagged();
pdfA.GetCatalog().SetLang(new PdfString("en-US"));
pdfA.GetCatalog().SetViewerPreferences(new PdfViewerPreferences().SetDisplayDocTitle(true));
PdfDocumentInfo info = pdfA.GetDocumentInfo();
info.SetTitle("test pdf/a");
However I got stuck on how to copy from the input stream to the writer and return the pdf/a as a stream.
Furthermore, can I use iText to perform a conformance text on the output?
Any ideas?

Related

itext7 CopyPagesTo not opening PDF

I am trying to add a Cover Page PDF file to another PDF file. I am using CopyPagesTo method. CoverPageFilePath will go before any pages in the pdfDocumentFile. I then need to rewrite that new file to the same location. When I run the code and open the new pdf file I get an error about it being damaged.
public static void iText7MergePDF()
{
byte[] modifiedPdfInBytes = null;
string pdfCoverPageFilePath = #"PathtoCoverPage\Cover Page.pdf";
PdfDocument pdfDocumentCover = new PdfDocument(new iText.Kernel.Pdf.PdfReader(pdfCoverPageFilePath));
string pdfDocumentFile =#"PathtoFullDocument.pdf";
var buffer = File.ReadAllBytes(pdfDocumentFile);
using (var originalPdfStream = new MemoryStream(buffer))
using (var modifiedPdfStream = new MemoryStream())
{
var pdfReader = new iText.Kernel.Pdf.PdfReader(originalPdfStream);
var pdfDocument = new PdfDocument(pdfReader, new PdfWriter(modifiedPdfStream));
int numberOfPages = pdfDocumentCover.GetNumberOfPages();
pdfDocumentCover.CopyPagesTo(1, numberOfPages, pdfDocument);
modifiedPdfInBytes = modifiedPdfStream.ToArray();
pdfDocument.Close();
}
System.IO.File.WriteAllBytes(pdfGL, modifiedPdfInBytes);
}

Whenever you have some other type, like a StreamWriter, or here a PdfWriter writing to a Stream, it may not write all the data to the Stream immediately.
Here you Close the pdfDocument for all the data to be written to the MemoryStream.
ie this
modifiedPdfInBytes = modifiedPdfStream.ToArray();
pdfDocument.Close();
Should be
pdfDocument.Close();
modifiedPdfInBytes = modifiedPdfStream.ToArray();

How to convert DocX from Xceed.Words.NET library to pdf and save it in a memory stream

I want to convert word byte array to pdf byte array.
I am using Xceed.Words.NET library
var stream = new MemoryStream(sourceFile.AttachedFile);
var doc = DocX.Load(stream);
var ms = new MemoryStream();
doc.SaveAs(ms);
var wByteArray = ms.GetBuffer();

Use this:
var stream = new MemoryStream(sourceFile.AttachedFile);
using (var document = DocX.Load(stream))
{
stream = new MemoryStream();
DocX.ConvertToPdf(document, stream);
}
var bytes = stream.ToArray();
As mentioned in the comment, you need a professional version of DocX library to convert a Word document to PDF.
If you're looking for free solution then perhaps you could try out GemBox.Document, its free version does support converting to PDF, but it has a document size limitation.
You can use it like this:
ComponentInfo.SetLicense("FREE-LIMITED-KEY");
var stream = new MemoryStream(sourceFile.AttachedFile);
var document = DocumentModel.Load(stream, LoadOptions.DocxDefault);
stream = new MemoryStream();
document.Save(stream, SaveOptions.PdfDefault);
var bytes = stream.ToArray();

C# Spire Document.SaveToStream not working

I have the following code but it is just creating a 0kb empty file.
using (var stream1 = new MemoryStream())
{
MemoryStream txtStream = new MemoryStream();
Document document = new Document();
fileInformation.Stream.CopyTo(stream1);
document.LoadFromStream(stream1, FileFormat.Auto);
document.SaveToStream(txtStream, FileFormat.Txt);
StreamReader reader = new StreamReader(txtStream);
string text = reader.ReadToEnd();
System.IO.File.WriteAllText(fileName + ".txt", text);
}
I know the data is successfully loaded into document because if do document.SaveToTxt("test.txt", Encoding.UTF8);
instead of the SaveToStream line it exports the file properly.
What am I doing wrong?

When copying a stream, you need to take care to reset the position to 0 if copying. As seen in the answer here, you can do something like this to your streams:
stream1.Position = 0;
txtStream.Position = 0;

Uploading ".xlsx" file using DropBox API making file corrupted

DropboxClient dbx = new DropboxClient("************************");
var file = "/Excel/FileName.xlsx";
byte[] bytes = null;
FileStream fs = new FileStream("C:\\Users\\Admin\\Desktop\\Test.xlsx", FileMode.Open, FileAccess.Read);
BinaryReader br = new BinaryReader(fs);
long numBytes = fs.Length;
bytes = br.ReadBytes((int)numBytes);
var mem = new MemoryStream(Encoding.UTF8.GetBytes(bytes.ToString()));
var updated = await dbx.Files.UploadAsync(file, WriteMode.Overwrite.Instance, body: mem);
Here is the code, it overwrite the existing file as per need but make that file corrupted.

I think you're thinking too complex here. UploadAsync expects a Stream. MemoryStream is indeed a Stream, but so is FileStream. Getting rid of the extra reader will result in:
var source = "C:\\Users\\Admin\\Desktop\\Test.xlsx";
var target = "/Excel/FileName.xlsx";
using(var dbx = new DropboxClient("***"))
using(var fs = new FileStream(source, FileMode.Open, FileAccess.Read))
{
var updated = await dbx.Files.UploadAsync(
target, WriteMode.Overwrite.Instance, body: fs);
}
The reason the file will get corrupt is because of reading the data incorrectly. bytes.ToString() will result in System.Byte[]. You're actually uploading System.Byte[] literally instead of the file's contents, which is not a valid Excel document. Also converting a binary file into UTF-8 text doesn't work as expected, because it alters the content being uploaded.

How to export the report that created using Telerik Reporting (.trdx) to pdf?

I use Telerik Report Designer (Standalone) to design a report (.trdx), how do I export it to pdf file programmatically in C# code?

I used below code snippet. It deserializes the .trdx file and then it creates a Report (Telerik.Reporting.Report) instance out of it. This report instance can then be converted to a pdf.
System.Xml.XmlReaderSettings settings = new System.Xml.XmlReaderSettings();
settings.IgnoreWhitespace = true;
//read the .trdx file contents
using (System.Xml.XmlReader xmlReader = System.Xml.XmlReader.Create(path_to your_trdx_file, settings))
{
Telerik.Reporting.XmlSerialization.ReportXmlSerializer xmlSerializer =
new Telerik.Reporting.XmlSerialization.ReportXmlSerializer();
//deserialize the .trdx report XML contents
Telerik.Reporting.Report report = (Telerik.Reporting.Report)
xmlSerializer.Deserialize(xmlReader);
string mimType = string.Empty;
string extension = string.Empty;
Encoding encoding = null;
// call Render() and retrieve raw array of bytes
// write the pdf file
byte[] buffer = Telerik.Reporting.Processing.ReportProcessor.Render(
"PDF", report, null, out mimType, out extension, out encoding);
// create a new file on disk and write the byte array to the file
FileStream fs = new FileStream(Path_you_need_to_save_the_pdf_file, FileMode.Create);
fs.Write(buffer, 0, buffer.Length);
fs.Flush();
fs.Close();
}

If you user Telerik reporting 2012 or newer the above code needs to be changed to this
enter code here
XmlReaderSettings settings = new XmlReaderSettings();
settings.IgnoreWhitespace = true;
//read the .trdx file contents
using (
XmlReader xmlReader =
XmlReader.Create(you trdx file path,
settings))
{
ReportXmlSerializer xmlSerializer =
new ReportXmlSerializer();
//deserialize the .trdx report XML contents
Report report = (Report)xmlSerializer.Deserialize(xmlReader);
Telerik.Reporting.InstanceReportSource instanceReportSource = new Telerik.Reporting.InstanceReportSource
{
ReportDocument = report
};
string mimType = string.Empty;
string extension = string.Empty;
//Encoding encoding = null;
// call Render() and retrieve raw array of bytes
// write the pdf file
ReportProcessor reportProcessor = new ReportProcessor();
RenderingResult renderingResult = reportProcessor.RenderReport("DOCX", instanceReportSource, null);
// create a new file on disk and write the byte array to the file
FileStream fs = new FileStream(#"D:\test\Dashboard.DOCX", FileMode.Create);
fs.Write(renderingResult.DocumentBytes, 0, renderingResult.DocumentBytes.Length);
fs.Flush();
fs.Close();
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Convert a PDF to PDF/A using IText 7? - c#

Related

itext7 CopyPagesTo not opening PDF

How to convert DocX from Xceed.Words.NET library to pdf and save it in a memory stream

C# Spire Document.SaveToStream not working

Uploading ".xlsx" file using DropBox API making file corrupted

How to export the report that created using Telerik Reporting (.trdx) to pdf?

Categories

Resources