I'm using CarlosAg ExcelXmlWriter library which generates XML Excel spreadsheets 2003 (*.xml)
I need to find coomercial or free tool just that converts this generated xml spreadsheet into
PDF. I tried SautinSoft library but it didn't work with my desired extension (xml) it only works with xlsx or xls extesnions
thanks guys in advance
Try Aspose.
http://www.aspose.com/categories/.net-components/aspose.cells-for-.net/default.aspx
You might also need the PDF component, not sure how they do it now.
Can you simply use some pdf printer to do it?
Try to use a free solution (EpPlus): https://github.com/EPPlusSoftware/EPPlus
Or SpreadSheet https://spreadsheetlight.com/
An another way:
static void ConvertFromStream()
{
// The conversion process will be done completely in memory.
string inpFile = #"..\..\..\example.xml";
string outFile = #"ResultStream.pdf";
byte[] inpData = File.ReadAllBytes(inpFile);
byte[] outData = null;
using (MemoryStream msInp = new MemoryStream(inpData))
{
// Load a document.
DocumentCore dc = DocumentCore.Load(msInp, new XMLLoadOptions());
// Save the document to PDF format.
using (MemoryStream outMs = new MemoryStream())
{
dc.Save(outMs, new PdfSaveOptions() );
outData = outMs.ToArray();
}
// Show the result for demonstration purposes.
if (outData != null)
{
File.WriteAllBytes(outFile, outData);
System.Diagnostics.Process.Start(new System.Diagnostics.ProcessStartInfo(outFile) { UseShellExecute = true });
}
}
Related
I have a c# class that takes an HTML and converts it to PDF using wkhtmltopdf.
As you will see below, I am generating 3 PDFs - Landscape, Portrait, and combined of the two.
The properties object contains the html as a string, and the argument for landscape/portrait.
System.IO.MemoryStream PDF = new WkHtmlToPdfConverter().GetPdfStream(properties);
System.IO.FileStream file = new System.IO.FileStream("abc_landscape.pdf", System.IO.FileMode.Create);
PDF.Position = 0;
properties.IsHorizontalOrientation = false;
System.IO.MemoryStream PDF_portrait = new WkHtmlToPdfConverter().GetPdfStream(properties);
System.IO.FileStream file_portrait = new System.IO.FileStream("abc_portrait.pdf", System.IO.FileMode.Create);
PDF_portrait.Position = 0;
System.IO.MemoryStream finalStream = new System.IO.MemoryStream();
PDF.CopyTo(finalStream);
PDF_portrait.CopyTo(finalStream);
System.IO.FileStream file_combined = new System.IO.FileStream("abc_combined.pdf", System.IO.FileMode.Create);
try
{
PDF.WriteTo(file);
PDF.Flush();
PDF_portrait.WriteTo(file_portrait);
PDF_portrait.Flush();
finalStream.WriteTo(file_combined);
finalStream.Flush();
}
catch (Exception)
{
throw;
}
finally
{
PDF.Close();
file.Close();
PDF_portrait.Close();
file_portrait.Close();
finalStream.Close();
file_combined.Close();
}
The PDFs "abc_landscape.pdf" and "abc_portrait.pdf" generate correctly, as expected, but the operation fails when I try to combine the two in a third pdf (abc_combined.pdf).
I am using MemoryStream to preform the merge, and at the time of debug, I can see that the finalStream.length is equal to the sum of the previous two PDFs. But when I try to open the PDF, I see the content of just 1 of the two PDFs.
The same can be seen below:
Additionally, when I try to close the "abc_combined.pdf", I am prompted to save it, which does not happen with the other 2 PDFs.
Below are a few things that I have tried out already, to no avail:
Change CopyTo() to WriteTo()
Merge the same PDF (either Landscape or Portrait one) with itself
In case it is required, below is the elaboration of the GetPdfStream() method.
var htmlStream = new MemoryStream();
var writer = new StreamWriter(htmlStream);
writer.Write(htmlString);
writer.Flush();
htmlStream.Position = 0;
return htmlStream;
Process process = Process.Start(psi);
process.EnableRaisingEvents = true;
try
{
process.Start();
process.BeginErrorReadLine();
var inputTask = Task.Run(() =>
{
htmlStream.CopyTo(process.StandardInput.BaseStream);
process.StandardInput.Close();
});
// Copy the output to a memorystream
MemoryStream pdf = new MemoryStream();
var outputTask = Task.Run(() =>
{
process.StandardOutput.BaseStream.CopyTo(pdf);
});
Task.WaitAll(inputTask, outputTask);
process.WaitForExit();
// Reset memorystream read position
pdf.Position = 0;
return pdf;
}
catch (Exception ex)
{
throw ex;
}
finally
{
process.Dispose();
}
Merging pdf in C# or any other language is not straight forward with out using 3rd party library.
I assume your requirement for not using library is that most Free libraries, nuget packages has limitation or/and cost money for commercial use.
I have made research and found you an Open Source library called PdfClown with nuget package, it is also available for Java. It is Free with out limitation (donate if you like). The library has a lot of features. One such you can merge 2 or more documents to one document.
I supply my example that take a folder with multiple pdf files, merged it and save it to same or another folder. It is also possible to use MemoryStream, but I do not find it necessary in this case.
The code is self explaining, the key point here is using SerializationModeEnum.Incremental:
public static void MergePdf(string srcPath, string destFile)
{
var list = Directory.GetFiles(Path.GetFullPath(srcPath));
if (string.IsNullOrWhiteSpace(srcPath) || string.IsNullOrWhiteSpace(destFile) || list.Length <= 1)
return;
var files = list.Select(File.ReadAllBytes).ToList();
using (var dest = new org.pdfclown.files.File(new org.pdfclown.bytes.Buffer(files[0])))
{
var document = dest.Document;
var builder = new org.pdfclown.tools.PageManager(document);
foreach (var file in files.Skip(1))
{
using (var src = new org.pdfclown.files.File(new org.pdfclown.bytes.Buffer(file)))
{ builder.Add(src.Document); }
}
dest.Save(destFile, SerializationModeEnum.Incremental);
}
}
To test it
var srcPath = #"C:\temp\pdf\input";
var destFile = #"c:\temp\pdf\output\merged.pdf";
MergePdf(srcPath, destFile);
Input examples
PDF doc A and PDF doc B
Output example
Links to my research:
https://csharp-source.net/open-source/pdf-libraries
https://sourceforge.net/projects/clown/
https://www.oipapio.com/question-3526089
Disclaimer: A part of this answer is taken from my my personal web site https://itbackyard.com/merge-multiple-pdf-files-to-one-pdf-file-in-c/ with source code to github.
This answer from Stack Overflow (Combine two (or more) PDF's) by Andrew Burns works for me:
using (PdfDocument one = PdfReader.Open("pdf 1.pdf", PdfDocumentOpenMode.Import))
using (PdfDocument two = PdfReader.Open("pdf 2.pdf", PdfDocumentOpenMode.Import))
using (PdfDocument outPdf = new PdfDocument())
{
CopyPages(one, outPdf);
CopyPages(two, outPdf);
outPdf.Save("file1and2.pdf");
}
void CopyPages(PdfDocument from, PdfDocument to)
{
for (int i = 0; i < from.PageCount; i++)
{
to.AddPage(from.Pages[i]);
}
}
That's not quite how PDFs work. PDFs are structured files in a specific format.
You can't just append the bytes of one to the other and expect the result to be a valid document.
You're going to have to use a library that understands the format and can do the operation for you, or developing your own solution.
PDF files aren't just text and images. Behind the scenes there is a strict file format that describes things like PDF version, the objects contained in the file and where to find them.
In order to merge 2 PDFs you'll need to manipulate the streams.
First you'll need to conserve the header from only one of the files. This is pretty easy since it's just the first line.
Then you can write the body of the first page, and then the second.
Now the hard part, and likely the part that will convince you to use a library, is that you have to re-build the xref table. The xref table is a cross reference table that describes the content of the document and more importantly where to find each element. You'd have to calculate the byte offset of the second page, shift all of the elements in it's xref table by that much, and then add it's xref table to the first. You'll also need to ensure you create objects in the xref table for the page break.
Once that's done, you need to re-build the document trailer which tells an application where the various sections of the document are among other things.
See https://resources.infosecinstitute.com/pdf-file-format-basic-structure/
This is not trivial and you'll end up re-writing lots of code that already exists.
I am creating a pdf document using C# code in my process. I need to protect the docuemnt
with some standard password like "123456" or some account number. I need to do this without
any reference dlls like pdf writer.
I am generating the PDF file using SQL Reporting services reports.
Is there are easiest way.
I am creating a pdf document using C#
code in my process
Are you using some library to create this document? The pdf specification (8.6MB) is quite big and all tasks involving pdf manipulation could be difficult without using a third party library. Password protecting and encrypting your pdf files with the free and open source itextsharp library is quite easy:
using (Stream input = new FileStream("test.pdf", FileMode.Open, FileAccess.Read, FileShare.Read))
using (Stream output = new FileStream("test_encrypted.pdf", FileMode.Create, FileAccess.Write, FileShare.None))
{
PdfReader reader = new PdfReader(input);
PdfEncryptor.Encrypt(reader, output, true, "secret", "secret", PdfWriter.ALLOW_PRINTING);
}
It would be very difficult to do this without using a PDF library. Basically, you'll need to develop such library yourselves.
With help of a PDF library everything is much simpler. Here is a sample that shows how a document can easily be protected using Docotic.Pdf library:
public static void protectWithPassword(string input, string output)
{
using (PdfDocument doc = new PdfDocument(input))
{
// set owner password (a password required to change permissions)
doc.OwnerPassword = "pass";
// set empty user password (this will allow anyone to
// view document without need to enter password)
doc.UserPassword = "";
// setup encryption algorithm
doc.Encryption = PdfEncryptionAlgorithm.Aes128Bit;
// [optionally] setup permissions
doc.Permissions.CopyContents = false;
doc.Permissions.ExtractContents = false;
doc.Save(output);
}
}
Disclaimer: I work for the vendor of the library.
If anyone is looking for a IText7 reference.
private string password = "#d45235fewf";
private const string pdfFile = #"C:\Temp\Old.pdf";
private const string pdfFileOut = #"C:\Temp\New.pdf";
public void DecryptPdf()
{
//Set reader properties and password
ReaderProperties rp = new ReaderProperties();
rp.SetPassword(new System.Text.UTF8Encoding().GetBytes(password));
//Read the PDF and write to new pdf
using (PdfReader reader = new PdfReader(pdfFile, rp))
{
reader.SetUnethicalReading(true);
PdfDocument pdf = new PdfDocument(reader, new PdfWriter(pdfFileOut));
pdf.GetFirstPage(); // Get at the very least the first page
}
}
I am working with ClosedXML utility for excel operations. It only supports files created on office versions 2007 on-words. I have an xls file and required to convert to xlms(macro enabled). Simple copy as shown below is not working.
string oldFile = #"C:\Files\xls.xls";
string newFile = #"C:\Files\xlsnew.xlsm";
File.Copy(xls_file,xlsm_file);
And I have used below code also
byte[] byteArray = File.ReadAllBytes(oldFile);
using (MemoryStream stream = new MemoryStream())
{
stream.Write(byteArray, 0, (int)byteArray.Length);
using (SpreadsheetDocument spreadsheetDoc = SpreadsheetDocument.Open(stream, true))
{
// Change from template type to workbook type
spreadsheetDoc.ChangeDocumentType(SpreadsheetDocumentType.MacroEnabledWorkbook);
}
File.WriteAllBytes(newFile, stream.ToArray());
}
Please provide a helpful solution.
Thanks in advance.
I'm using the EPPLUS library to read data from Excel to create another file. Unfortunately it does not support the .XLSM extension file. Is there a nice way to convert .XLSM files to .XLSX file for the purpose of reading the file with EPPLUS?
(using EPPLUS for reading would be nice because all my code is already written using it :) )
In order to do this you will need to use the Open XML SDK 2.0. Below is a snippet of code that worked for me when I tried it:
byte[] byteArray = File.ReadAllBytes("C:\\temp\\test.xlsm");
using (MemoryStream stream = new MemoryStream())
{
stream.Write(byteArray, 0, (int)byteArray.Length);
using (SpreadsheetDocument spreadsheetDoc = SpreadsheetDocument.Open(stream, true))
{
// Change from template type to workbook type
spreadsheetDoc.ChangeDocumentType(SpreadsheetDocumentType.Workbook);
}
File.WriteAllBytes("C:\\temp\\test.xlsx", stream.ToArray());
}
What this code does is it takes your macro enabled workbook file and opens it into a SpreadsheetDocument object. The type of this object is MacroEnabledWorkbook, but since you want it as a Workbook you call the ChangeDocumentType method to change it from a MacroEnabledWorkbook to a Workbook. This will work since the underlying XML is the same between a .xlsm and a .xlsx file.
Using the Open XML SDK, like in amurra's answer, but
in addition to changing doc type, VbaDataPart and VbaProjectPart should be removed, otherwise Excel will show error a file is corrupted.
using (var inputStream = File.OpenRead("C:\\temp\\test.xlsm"))
using (var outStream = new MemoryStream()) {
inputStream.CopyTo(outStream);
using (var doc = SpreadsheetDocument.Open(outStream, true)) {
doc.DeletePartsRecursivelyOfType<VbaDataPart>();
doc.DeletePartsRecursivelyOfType<VbaProjectPart>();
doc.ChangeDocumentType(DocumentFormat.OpenXml.SpreadsheetDocumentType.Workbook);
}
File.WriteAllBytes("C:\\temp\\test.xlsx", outStream.ToArray());
}
package xlsbtoxlsx;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.regex.Pattern;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackagePart;
import org.apache.poi.openxml4j.opc.PackageRelationship;
import org.apache.poi.openxml4j.opc.PackageRelationshipCollection;
import org.apache.poi.ss.usermodel.WorkbookFactory;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbookType;
public class XlsbToXlsxConvertor {
public static void main(String[] args) throws Exception {
String inputpath="C:\\Excel Data Files\\XLSB\\CSD_TDR_20200823";
String outputpath="C:\\Excel Data Files\\XLSB\\output";
new XlsbToXlsxConvertor().xlsmToxlsxFileConvertor(inputpath, outputpath);
}
public void xlsmToxlsxFileConvertor(String inputpath, String outputpath) throws Exception {
XSSFWorkbook workbook;
FileOutputStream out;
System.out.println("inputpath " + inputpath);
File directoryPath = new File(inputpath);
// List of all files and directories
String contents[] = directoryPath.list();
System.out.println("List of files and directories in the specified directory:");
for (int i = 0; i < contents.length; i++) {
System.out.println(contents[i]);
// create workbook from XLSM template
workbook = (XSSFWorkbook) WorkbookFactory
.create(new FileInputStream(inputpath + File.separator + contents[i]));
// save copy as XLSX ----------------START
OPCPackage opcpackage = workbook.getPackage();
// get and remove the vbaProject.bin part from the package
PackagePart vbapart = opcpackage.getPartsByName(Pattern.compile("/xl/vbaProject.bin")).get(0);
opcpackage.removePart(vbapart);
// get and remove the relationship to the removed vbaProject.bin part from the
// package
PackagePart wbpart = workbook.getPackagePart();
PackageRelationshipCollection wbrelcollection = wbpart
.getRelationshipsByType("http://schemas.microsoft.com/office/2006/relationships/vbaProject");
for (PackageRelationship relship : wbrelcollection) {
wbpart.removeRelationship(relship.getId());
}
// set content type to XLSX
workbook.setWorkbookType(XSSFWorkbookType.XLSX);
// write out the XLSX
out = new FileOutputStream(outputpath + File.separator + contents[i].replace(".xlsm", "") + ".xlsx");
workbook.write(out);
out.close();
System.out.println("done");
workbook.close();
}
}
}
I can successfully inject a piece of VBA code into a generated excel workbook, but what I am trying to do is use the Workbook_Open() event so the VBA code executes when the file opens. I am adding the sub to the "ThisWorkbook" object in my xlsm template file. I then use the openxml productivity tool to reflect the code and get the encoded VBA data.
When the file is generated and I view the VBA, I see "ThisWorkbook" and "ThisWorkbook1" objects. My VBA is in "ThisWorkbook" object but the code never executes on open. If I move my VBA code to "ThisWorkbook1" and re-open the file, it works fine. Why is an extra "ThisWorkbook" created? Is it not possible to inject an excel spreadsheet with a Workbook_Open() sub? Here is a snippet of the C# code I am using:
private string partData = "..."; //base 64 encoded data from reflection code
//open workbook, myWorkbook
VbaProjectPart newPart = myWorkbook.WorkbookPart.AddNewPart<VbaProjectPart>("rId1");
System.IO.Stream data = GetBinaryDataStream(partData);
newPart.FeedData(data);
data.Close();
//save and close workbook
Anyone have ideas?
Based on my research there isn't a way to insert the project part data in a format that you can manipulate in C#. In the OpenXML format, the VBA project is still stored in a binary format. However, copying the VbaProjectPart from one Excel document into another should work. As a result, you'd have to determine what you wanted the project part to say in advance.
If you are OK with this, then you can add the following code to a template Excel file in the 'ThisWorkbook' Microsoft Excel Object, along with the appropriate Macro code:
Private Sub Workbook_Open()
Run "Module1.SomeMacroName()"
End Sub
To copy the VbaProjectPart object from one file to the other, you would use code like this:
public static void InsertVbaPart()
{
using(SpreadsheetDocument ssDoc = SpreadsheetDocument.Open("file1.xlsm", false))
{
WorkbookPart wbPart = ssDoc.WorkbookPart;
MemoryStream ms;
CopyStream(ssDoc.WorkbookPart.VbaProjectPart.GetStream(), ms);
using(SpreadsheetDocument ssDoc2 = SpreadsheetDocument.Open("file2.xlsm", true))
{
Stream stream = ssDoc2.WorkbookPart.VbaProjectPart.GetStream();
ms.WriteTo(stream);
}
}
}
public static void CopyStream(Stream input, Stream output)
{
byte[] buffer = new byte[short.MaxValue + 1];
while (true)
{
int read = input.Read(buffer, 0, buffer.Length);
if (read <= 0)
return;
output.Write(buffer, 0, read);
}
}
Hope that helps.
I found that the other answers still resulted in the duplicate "Worksheet" object. I used a similar solution to what #ZlotaMoneta said, but with a different syntax found here:
List<VbaProjectPart> newParts = new List<VbaProjectPart>();
using (var originalDocument = SpreadsheetDocument.Open("file1.xlsm"), false))
{
newParts = originalDocument.WorkbookPart.GetPartsOfType<VbaProjectPart>().ToList();
using (var document = SpreadsheetDocument.Open("file2.xlsm", true))
{
document.WorkbookPart.DeleteParts(document.WorkbookPart.GetPartsOfType<VbaProjectPart>());
foreach (var part in newParts)
{
VbaProjectPart vbaProjectPart = document.WorkbookPart.AddNewPart<VbaProjectPart>();
using (Stream data = part.GetStream())
{
vbaProjectPart.FeedData(data);
}
}
//Note this prevents the duplicate worksheet issue
spreadsheetDocument.WorkbookPart.Workbook.WorkbookProperties.CodeName = "ThisWorkbook";
}
}
You need to specify "codeName" attribute in the "xl/workbook..xml" object
After feeding the VbaProjectPart with macro. Add this code:
var workbookPr = spreadsheetDocument.WorkbookPart.Workbook.Descendants<WorkbookProperties>().FirstOrDefault();
workbookPr.CodeName = "ThisWorkBook";
After opening the file everything should work now.
So, to add macro you need to:
Change document type to macro enabled
Add VbaProjectPart and feed it with earlier created macro
Add workbookPr codeName attr in xl/workbook..xml with value "ThisWorkBook"
Save as with .xlsm ext.