Removing vbaProject.bin in XML not enough to save as xlsx - c#

After exporting data into an Excel workbook with macros (xlsm), I run the macro and then remove the macro in order to be able to save the workbook as xlsx. For removing macros, I open the xlsm as zip archive (via C# ZipFile class), remove the entry "xl/vbaProject.bin" and remove a relation within "xl/_rels/workbook.xml.rels". Then I rename the file from xlsm to xlsx. That works fine so far but when I open the xlsx file in Excel, I get "Excel cannot open the file because the file format or file extension is not valid. Verify that the file has not been corrupted and that the file extension matches the format of the file", so there seems something to be missing in order to completely remove the vba code within the workbook. Can anyone help me here?
const string vbaProjectEntryName = "xl/vbaProject.bin"; // Contains the VBA code
const string relationsEntryName = "xl/_rels/workbook.xml.rels"; // Relation/Link to the vba project
using (var zip = ZipFile.Open(fileName, ZipArchiveMode.Update))
{
var entry = zip.GetEntry(vbaProjectEntryName);
if (entry != null)
{
entry.Delete();
entry = zip.GetEntry(relationsEntryName);
if (entry != null)
{
var contents = string.Empty;
using (var streamReader = new StreamReader(entry.Open()))
{
contents = streamReader.ReadToEnd();
}
var relationText = "<Relationship Id=\"rId6\" Type=\"http://schemas.microsoft.com/office/2006/relationships/vbaProject\" Target=\"vbaProject.bin\"/>";
contents = contents.Replace(relationText, string.Empty);
entry.Delete();
entry = zip.CreateEntry(relationsEntryName);
using (var streamWriter = new StreamWriter(entry.Open()))
{
streamWriter.Write(contents);
}
}
}
}

Related

how to get xls file from a ZipArchiveEntry EPPlus C#

i'am trying to get an xls file from an ZipArchive but cant get it with EPPLUS
foreach (ZipArchiveEntry entry in archive.Entries)
{
if (entry != null)
{
string filepath = entry.FullName;
FileInfo fileInfo = new FileInfo(filepath);
//here i got the excel package with the xls file inside the excelPackage
using (ExcelPackage excelPackage = new ExcelPackage(fileInfo))
{
//but here impossible de get the worksheet or workbook inside or anything else
ExcelWorksheet worksheet = excelPackage.Workbook.Worksheets.FirstOrDefault();
int totalColomn = worksheet.Dimension.End.Column;
int nbrsheet = excelPackage.Workbook.Worksheets.Count();
}
}
}
the ExcelPackage i get in debug
i see the xls file on debug inside the excelpackage but just when i try to get worksheet it exit without exception code....
same here when trying with entryStream
using (var entryStream = entry.Open())
{
//Cant even get the excelpackage, it crash here without exception
using (ExcelPackage excelPackage = new ExcelPackage(entryStream))
{
ExcelWorksheet worksheetest = excelPackage.Workbook.Worksheets.FirstOrDefault();
}
}
the stream here seem also strange ...
entryStream Debug
Working with .NET CORE Blazor ServerSide, ePPLUS 4.5
Thanks for helping
entry.FullName refers to the full path to the file inside the zip archive, while FileInfo describes a file in the filesystem of the OS, which is a completely different thing. You haven't extracted anything to the OS filesystem yet, so the FileInfo won't refer to a file that actually exists.
Try the ExcelPackage constructor that takes a Stream, which you can get directly from a ZipArchiveEntry:
using (var entryStream = entry.Open())
{
using (ExcelPackage excelPackage = new ExcelPackage(entryStream))
{
// ...
}
}
I find the problem.
it was that i tried to get an xls file and the epplus library dont work with it...
you have to be careful, EPplus dont work with xls file
So , your solution Jeff is working, it was my fault, didn't specified the extension of my excel file... sorry
-> EPlus with an .xlsx OK, not .xls
My bad.
Thanks anyway :-)

File wont open after saving

I have some code that is suppose to enter some values into several excel workbooks. Right now the program doesn't even put any values into the workbooks and only saves them. Even like this i get this error when opening the files: Excel cannot open the file **.xlsm because the file format or file extension is not valid. Verify that the file has been corrupted and that the file extension matches the format of the file.
I have writen many programs that work with excel files and never had this problem. In the code you can see that i basically just go through a for loop and save the file.
try
{
fileInfo = new FileInfo(Path.GetDirectoryName(Application.StartupPath) + '\\' + partners[partner].partnerName + #"\PDP_ExSumm_" + partners[partner].partnerName + ".xlsm");
using (ExcelPackage excelPackage = new ExcelPackage(fileInfo))
{
ExcelWorksheet worksheet = excelPackage.Workbook.Worksheets[1];
for (int cell = 0; cell < ExSummCells.Count; cell++)
{
if (ExSummCells[cell] != "")
{
// worksheet.Cells[ExSummCells[cell]].Value = partners[partner].exSummData[partner];
}
excelPackage.Save();
}
}

Replacing Invalid XML characters from an excel file and writing it back to disk causes file is corrupted error on opening in MS Excel

A little background on problem:
We have an ASP.NET MVC5 Application where we use FlexMonster to show the data in grid. The data source is a stored procedure that brings all the data into the UI grid, and once user clicks on export button, it exports the report to Excel. However, in some cases export to excel is failing.
Some of the data has some invalid characters, and it is not possible/feasible to fix the source as suggested here
My approach so far:
EPPlus library fails on initializing the workbook as the input excel file contains some invalid XML characters. I could find that the file is dumped with some invalid character in it. I looked into the possible approaches .
Firstly, I identified the problematic character in the excel file. I first tried to replace the invalid character with blank space manually using Notepad++ and the EPPlus could successfully read the file.
Now using the approaches given in other SO thread here and here, I replaced all possible occurrences of invalid chars. I am using at the moment
XmlConvert.IsXmlChar
method to find out the problematic XML character and replacing with blank space.
I created a sample program where I am trying to work on the problematic excel sheet.
//in main method
String readFile = File.ReadAllText(filePath);
string content = RemoveInvalidXmlChars(readFile);
File.WriteAllText(filePath, content);
//removal of invalid characters
static string RemoveInvalidXmlChars(string inputText)
{
StringBuilder withoutInvalidXmlCharsBuilder = new StringBuilder();
int firstOccurenceOfRealData = inputText.IndexOf("<t>");
int lastOccurenceOfRealData = inputText.LastIndexOf("</t>");
if (firstOccurenceOfRealData < 0 ||
lastOccurenceOfRealData < 0 ||
firstOccurenceOfRealData > lastOccurenceOfRealData)
return inputText;
withoutInvalidXmlCharsBuilder.Append(inputText.Substring(0, firstOccurenceOfRealData));
int remaining = lastOccurenceOfRealData - firstOccurenceOfRealData;
string textToCheckFor = inputText.Substring(firstOccurenceOfRealData, remaining);
foreach (char c in textToCheckFor)
{
withoutInvalidXmlCharsBuilder.Append((XmlConvert.IsXmlChar(c)) ? c : ' ');
}
withoutInvalidXmlCharsBuilder.Append(inputText.Substring(lastOccurenceOfRealData));
return withoutInvalidXmlCharsBuilder.ToString();
}
If I replaces the problematic character manually using notepad++, then the file opens fine in MSExcel. The above mentioned code successfully replaces the same invalid character and writes the content back to the file. However, when I try to open the excel file using MS Excel, it throws an error saying that file may have been corrupted and no content is displayed (snapshots below). Moreover, Following code
var excelPackage = new ExcelPackage(new FileInfo(filePath));
on the file that I updated via Notepad++, throws following exception
"CRC error: the file being extracted appears to be corrupted. Expected 0x7478AABE, Actual 0xE9191E00"}
My Questions:
Is my approach to modify content this way correct?
If yes, How can I write updated string to an Excel file?
If my approach is wrong then, How can I proceed to get rid of invalid XML chars?
Errors shown on opening file (without invalid XML char):
First Pop up
When I click on yes
Thanks in advance !
It does sounds like a binary (presumable XLSX) file based on your last comment. To confirm, open the file created by the FlexMonster with 7zip. If it opens properly and you see a bunch of XML files in folders, its a XLSX.
In that case, a search/replace on a binary file sounds like a very bad idea. It might work on the XML parts but might also replace legit chars in other parts. I think the better approach would be to do as #PanagiotisKanavos suggests and use ZipArchive. But you have to do rebuild it in the right order otherwise Excel complains. Similar to how it was done here https://stackoverflow.com/a/33312038/1324284, you could do something like this:
public static void ReplaceXmlString(this ZipArchive xlsxZip, FileInfo outFile, string oldString, string newstring)
{
using (var outStream = outFile.Open(FileMode.Create, FileAccess.ReadWrite))
using (var copiedzip = new ZipArchive(outStream, ZipArchiveMode.Update))
{
//Go though each file in the zip one by one and copy over to the new file - entries need to be in order
foreach (var entry in xlsxZip.Entries)
{
var newentry = copiedzip.CreateEntry(entry.FullName);
var newstream = newentry.Open();
var orgstream = entry.Open();
//Copy non-xml files over
if (!entry.Name.EndsWith(".xml"))
{
orgstream.CopyTo(newstream);
}
else
{
//Load the xml document to manipulate
var xdoc = new XmlDocument();
xdoc.Load(orgstream);
var xml = xdoc.OuterXml.Replace(oldString, newstring);
xdoc = new XmlDocument();
xdoc.LoadXml(xml);
xdoc.Save(newstream);
}
orgstream.Close();
newstream.Flush();
newstream.Close();
}
}
}
When it is used like this:
[TestMethod]
public void ReplaceXmlTest()
{
var datatable = new DataTable("tblData");
datatable.Columns.AddRange(new[]
{
new DataColumn("Col1", typeof (int)),
new DataColumn("Col2", typeof (int)),
new DataColumn("Col3", typeof (string))
});
for (var i = 0; i < 10; i++)
{
var row = datatable.NewRow();
row[0] = i;
row[1] = i * 10;
row[2] = i % 2 == 0 ? "ABCD" : "AXCD";
datatable.Rows.Add(row);
}
using (var pck = new ExcelPackage())
{
var workbook = pck.Workbook;
var worksheet = workbook.Worksheets.Add("source");
worksheet.Cells.LoadFromDataTable(datatable, true);
worksheet.Tables.Add(worksheet.Cells["A1:C11"], "Table1");
//Now similulate the copy/open of the excel file into a zip archive
using (var orginalzip = new ZipArchive(new MemoryStream(pck.GetAsByteArray()), ZipArchiveMode.Read))
{
var fi = new FileInfo(#"c:\temp\ReplaceXmlTest.xlsx");
if (fi.Exists)
fi.Delete();
orginalzip.ReplaceXmlString(fi, "AXCD", "REPLACED!!");
}
}
}
Gives this:
Just keep in mind that this is completely brute force. Anything you can do to make the file filter smarter rather then simply doing ALL xml files would be a very good thing. Maybe limit it to the SharedString.xml file if that is where the problem lies or in the xml files in the worksheet folders. Hard to say without knowing more about the data.

Create a temporary excel file with epplus?

After adding all I want to a newly created excel file with epplus, how do I open it only? I don't want to save the file first then open it, is this possible? I want it to just open and let the user decide if he wants to save it or not.
The only code i've found and tried so far generates a file name, saves the excel file, and then open it.
Byte[] bin = p.GetAsByteArray();
string file = Guid.NewGuid().ToString() + ".xlsx";
File.WriteAllBytes(file, bin);
ProcessStartInfo pi = new ProcessStartInfo(file);
Process.Start(pi);
EPPlus is generating the xml in a renamed zip file so there is no mechanism to transfer it to Excel without saving it somewhere. But you can always save to the users temp folder - this is what most programs have to do at some point in order to transfer files between each other. Can do something like this using System.IO.Path.GetTempPath():
[TestMethod]
public void TempFolderTest()
{
var path = Path.Combine(Path.GetTempPath(), "temp.xlsx");
var tempfile = new FileInfo(path);
if (tempfile.Exists)
tempfile.Delete();
//Save the file
using (var pck = new ExcelPackage(tempfile))
{
var ws = pck.Workbook.Worksheets.Add("Demo");
ws.Cells[1, 2].Value = "Excel Test";
pck.Save();
}
//open the file
Process.Start(tempfile.FullName);
}
(taken from: Open ExcelPackage Object with Excel application without saving it on local file path)
it's not possible you can't Create a temporary excel file with epplus

Convert XLSM to XLSX

I'm using the EPPLUS library to read data from Excel to create another file. Unfortunately it does not support the .XLSM extension file. Is there a nice way to convert .XLSM files to .XLSX file for the purpose of reading the file with EPPLUS?
(using EPPLUS for reading would be nice because all my code is already written using it :) )
In order to do this you will need to use the Open XML SDK 2.0. Below is a snippet of code that worked for me when I tried it:
byte[] byteArray = File.ReadAllBytes("C:\\temp\\test.xlsm");
using (MemoryStream stream = new MemoryStream())
{
stream.Write(byteArray, 0, (int)byteArray.Length);
using (SpreadsheetDocument spreadsheetDoc = SpreadsheetDocument.Open(stream, true))
{
// Change from template type to workbook type
spreadsheetDoc.ChangeDocumentType(SpreadsheetDocumentType.Workbook);
}
File.WriteAllBytes("C:\\temp\\test.xlsx", stream.ToArray());
}
What this code does is it takes your macro enabled workbook file and opens it into a SpreadsheetDocument object. The type of this object is MacroEnabledWorkbook, but since you want it as a Workbook you call the ChangeDocumentType method to change it from a MacroEnabledWorkbook to a Workbook. This will work since the underlying XML is the same between a .xlsm and a .xlsx file.
Using the Open XML SDK, like in amurra's answer, but
in addition to changing doc type, VbaDataPart and VbaProjectPart should be removed, otherwise Excel will show error a file is corrupted.
using (var inputStream = File.OpenRead("C:\\temp\\test.xlsm"))
using (var outStream = new MemoryStream()) {
inputStream.CopyTo(outStream);
using (var doc = SpreadsheetDocument.Open(outStream, true)) {
doc.DeletePartsRecursivelyOfType<VbaDataPart>();
doc.DeletePartsRecursivelyOfType<VbaProjectPart>();
doc.ChangeDocumentType(DocumentFormat.OpenXml.SpreadsheetDocumentType.Workbook);
}
File.WriteAllBytes("C:\\temp\\test.xlsx", outStream.ToArray());
}
package xlsbtoxlsx;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.regex.Pattern;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackagePart;
import org.apache.poi.openxml4j.opc.PackageRelationship;
import org.apache.poi.openxml4j.opc.PackageRelationshipCollection;
import org.apache.poi.ss.usermodel.WorkbookFactory;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbookType;
public class XlsbToXlsxConvertor {
public static void main(String[] args) throws Exception {
String inputpath="C:\\Excel Data Files\\XLSB\\CSD_TDR_20200823";
String outputpath="C:\\Excel Data Files\\XLSB\\output";
new XlsbToXlsxConvertor().xlsmToxlsxFileConvertor(inputpath, outputpath);
}
public void xlsmToxlsxFileConvertor(String inputpath, String outputpath) throws Exception {
XSSFWorkbook workbook;
FileOutputStream out;
System.out.println("inputpath " + inputpath);
File directoryPath = new File(inputpath);
// List of all files and directories
String contents[] = directoryPath.list();
System.out.println("List of files and directories in the specified directory:");
for (int i = 0; i < contents.length; i++) {
System.out.println(contents[i]);
// create workbook from XLSM template
workbook = (XSSFWorkbook) WorkbookFactory
.create(new FileInputStream(inputpath + File.separator + contents[i]));
// save copy as XLSX ----------------START
OPCPackage opcpackage = workbook.getPackage();
// get and remove the vbaProject.bin part from the package
PackagePart vbapart = opcpackage.getPartsByName(Pattern.compile("/xl/vbaProject.bin")).get(0);
opcpackage.removePart(vbapart);
// get and remove the relationship to the removed vbaProject.bin part from the
// package
PackagePart wbpart = workbook.getPackagePart();
PackageRelationshipCollection wbrelcollection = wbpart
.getRelationshipsByType("http://schemas.microsoft.com/office/2006/relationships/vbaProject");
for (PackageRelationship relship : wbrelcollection) {
wbpart.removeRelationship(relship.getId());
}
// set content type to XLSX
workbook.setWorkbookType(XSSFWorkbookType.XLSX);
// write out the XLSX
out = new FileOutputStream(outputpath + File.separator + contents[i].replace(".xlsm", "") + ".xlsx");
workbook.write(out);
out.close();
System.out.println("done");
workbook.close();
}
}
}

Categories

Resources