I got Excel xlsx document with hyperlinks.
Hyperlinks have adresses and subaddresses (that's the way VBA call Html fragments, all after # sign)
Epplus library has Hyperlink property for every cell, but it has only first part of html address, so instead of
stackoverflow.com#footer
I got:
stackoverflow.com
Is there any way to read the html fragment part with this library ?
Code for reading hyperlinks via epplus:
FileInfo xlsxFile = new FileInfo(_filePath);
using (ExcelPackage pck = new ExcelPackage(xlsxFile))
{
var wb = pck.Workbook;
if (wb == null)
return null;
var ws = wb.Worksheets.FirstOrDefault();
ExcelRange er = ws.Cells[0,0];
var hyperlink = er.Hyperlink;
It seems to be an issue with the way excel store hyperlinks and the way Epplus reads them. Excel stores the hyperlinks both in the worksheet itself as well as the relationship file for the worksheet which is a file that stores any kind of cross referencing between workbook parts (worksheets, styles, strings, etc). This all has to do with the structure of the an xlsx file which is xml based off of the OpenOffice XML standard: OpenOffice XML Info
So the problem is Epplus is relying on that relationship file which does not contain the fragment while the `hyperlink' node in the worksheet xml does. You can see all of this in its gory detail if you open up the xlsx file as a zip file by renaming it.
So, the short answer is you are forced to use the `.Value' of the cell object. Not as clean but it will work. For example, if I create a cell like this:
with this code:
var fi = new FileInfo(#"c:\temp\Html_Fragment.xlsx");
using (var pck = new ExcelPackage(fi))
{
var wb = pck.Workbook;
var ws = wb.Worksheets.FirstOrDefault();
ExcelRange er = ws.Cells[1,1];
var hyperlink = er.Hyperlink;
Console.WriteLine(er.Value);
Console.WriteLine("{{Value: {0}, Hyperlink: {1}}}", er.Value, er.Hyperlink.AbsoluteUri);
}
Gives this:
{
Value: https://msdn.microsoft.com/en-us/library/aa982683(v=office.12).aspx#Anchor_3,
Hyperlink: https://msdn.microsoft.com/en-us/library/aa982683(v=office.12).aspx
}
Related
In my c# .net project I have a controller action that has two excel SpreadsheetDocuments, I would like to take the first sheet of the second workbook and add it to the first workbook (so the first workbook will have two sheets).
My code currently looks like this
SpreadsheetDocument doc1 = SpreadsheetDocument.Open(stream, true);
SpreadsheetDocument doc2 = SpreadsheetDocument.Open(stream2, true);
var breakSheet = doc2.WorkbookPart.Workbook.Sheets.FirstChild;
doc1.WorkbookPart.Workbook.Sheets.Append(breakSheet);
stream.Seek(0, SeekOrigin.Begin);
return File(stream, System.Net.Mime.MediaTypeNames.Application.Octet, String.Format(fileName));
However on line 3 of this code I get the error "Cannot insert the OpenXmlElement "newChild" because it is part of a tree."
I know both of the SpreadsheetDocuments are valid because when I just return either one individually without attempting to combine them they both export successfully with the right data. So how can I successfully combine these two sheets?
If you can, I would suggest you to use ClosedXML, and copy the worksheet
There it might look like that:
private static void CopyWorksheet(Stream sourceStream, Stream targetStream)
{
var wb1 = new XLWorkbook(sourceStream);
var wb2 = new XLWorkbook(targetStream);
var sh1 = wb1.Worksheets.First();
sh1.CopyTo(wb2, sh1.Name + " from wb1");
}
A little background on problem:
We have an ASP.NET MVC5 Application where we use FlexMonster to show the data in grid. The data source is a stored procedure that brings all the data into the UI grid, and once user clicks on export button, it exports the report to Excel. However, in some cases export to excel is failing.
Some of the data has some invalid characters, and it is not possible/feasible to fix the source as suggested here
My approach so far:
EPPlus library fails on initializing the workbook as the input excel file contains some invalid XML characters. I could find that the file is dumped with some invalid character in it. I looked into the possible approaches .
Firstly, I identified the problematic character in the excel file. I first tried to replace the invalid character with blank space manually using Notepad++ and the EPPlus could successfully read the file.
Now using the approaches given in other SO thread here and here, I replaced all possible occurrences of invalid chars. I am using at the moment
XmlConvert.IsXmlChar
method to find out the problematic XML character and replacing with blank space.
I created a sample program where I am trying to work on the problematic excel sheet.
//in main method
String readFile = File.ReadAllText(filePath);
string content = RemoveInvalidXmlChars(readFile);
File.WriteAllText(filePath, content);
//removal of invalid characters
static string RemoveInvalidXmlChars(string inputText)
{
StringBuilder withoutInvalidXmlCharsBuilder = new StringBuilder();
int firstOccurenceOfRealData = inputText.IndexOf("<t>");
int lastOccurenceOfRealData = inputText.LastIndexOf("</t>");
if (firstOccurenceOfRealData < 0 ||
lastOccurenceOfRealData < 0 ||
firstOccurenceOfRealData > lastOccurenceOfRealData)
return inputText;
withoutInvalidXmlCharsBuilder.Append(inputText.Substring(0, firstOccurenceOfRealData));
int remaining = lastOccurenceOfRealData - firstOccurenceOfRealData;
string textToCheckFor = inputText.Substring(firstOccurenceOfRealData, remaining);
foreach (char c in textToCheckFor)
{
withoutInvalidXmlCharsBuilder.Append((XmlConvert.IsXmlChar(c)) ? c : ' ');
}
withoutInvalidXmlCharsBuilder.Append(inputText.Substring(lastOccurenceOfRealData));
return withoutInvalidXmlCharsBuilder.ToString();
}
If I replaces the problematic character manually using notepad++, then the file opens fine in MSExcel. The above mentioned code successfully replaces the same invalid character and writes the content back to the file. However, when I try to open the excel file using MS Excel, it throws an error saying that file may have been corrupted and no content is displayed (snapshots below). Moreover, Following code
var excelPackage = new ExcelPackage(new FileInfo(filePath));
on the file that I updated via Notepad++, throws following exception
"CRC error: the file being extracted appears to be corrupted. Expected 0x7478AABE, Actual 0xE9191E00"}
My Questions:
Is my approach to modify content this way correct?
If yes, How can I write updated string to an Excel file?
If my approach is wrong then, How can I proceed to get rid of invalid XML chars?
Errors shown on opening file (without invalid XML char):
First Pop up
When I click on yes
Thanks in advance !
It does sounds like a binary (presumable XLSX) file based on your last comment. To confirm, open the file created by the FlexMonster with 7zip. If it opens properly and you see a bunch of XML files in folders, its a XLSX.
In that case, a search/replace on a binary file sounds like a very bad idea. It might work on the XML parts but might also replace legit chars in other parts. I think the better approach would be to do as #PanagiotisKanavos suggests and use ZipArchive. But you have to do rebuild it in the right order otherwise Excel complains. Similar to how it was done here https://stackoverflow.com/a/33312038/1324284, you could do something like this:
public static void ReplaceXmlString(this ZipArchive xlsxZip, FileInfo outFile, string oldString, string newstring)
{
using (var outStream = outFile.Open(FileMode.Create, FileAccess.ReadWrite))
using (var copiedzip = new ZipArchive(outStream, ZipArchiveMode.Update))
{
//Go though each file in the zip one by one and copy over to the new file - entries need to be in order
foreach (var entry in xlsxZip.Entries)
{
var newentry = copiedzip.CreateEntry(entry.FullName);
var newstream = newentry.Open();
var orgstream = entry.Open();
//Copy non-xml files over
if (!entry.Name.EndsWith(".xml"))
{
orgstream.CopyTo(newstream);
}
else
{
//Load the xml document to manipulate
var xdoc = new XmlDocument();
xdoc.Load(orgstream);
var xml = xdoc.OuterXml.Replace(oldString, newstring);
xdoc = new XmlDocument();
xdoc.LoadXml(xml);
xdoc.Save(newstream);
}
orgstream.Close();
newstream.Flush();
newstream.Close();
}
}
}
When it is used like this:
[TestMethod]
public void ReplaceXmlTest()
{
var datatable = new DataTable("tblData");
datatable.Columns.AddRange(new[]
{
new DataColumn("Col1", typeof (int)),
new DataColumn("Col2", typeof (int)),
new DataColumn("Col3", typeof (string))
});
for (var i = 0; i < 10; i++)
{
var row = datatable.NewRow();
row[0] = i;
row[1] = i * 10;
row[2] = i % 2 == 0 ? "ABCD" : "AXCD";
datatable.Rows.Add(row);
}
using (var pck = new ExcelPackage())
{
var workbook = pck.Workbook;
var worksheet = workbook.Worksheets.Add("source");
worksheet.Cells.LoadFromDataTable(datatable, true);
worksheet.Tables.Add(worksheet.Cells["A1:C11"], "Table1");
//Now similulate the copy/open of the excel file into a zip archive
using (var orginalzip = new ZipArchive(new MemoryStream(pck.GetAsByteArray()), ZipArchiveMode.Read))
{
var fi = new FileInfo(#"c:\temp\ReplaceXmlTest.xlsx");
if (fi.Exists)
fi.Delete();
orginalzip.ReplaceXmlString(fi, "AXCD", "REPLACED!!");
}
}
}
Gives this:
Just keep in mind that this is completely brute force. Anything you can do to make the file filter smarter rather then simply doing ALL xml files would be a very good thing. Maybe limit it to the SharedString.xml file if that is where the problem lies or in the xml files in the worksheet folders. Hard to say without knowing more about the data.
I am trying to read a digital signature in the excel file
I need to read Signature Text(The person's name) and Signature Title(His designation/Title under signature line) .I can do it via Interop.Excel and openOffice.xml , but i still need to do the same thing via EPPlus. Is it possible to do the same thing via EPPlus. Please find the code for Interop.Excel
Excel.Workbook excelWorkbook = excelApp.Workbooks.Open(strFile)
SignatureSet allSignatures = excelWorkbook.Signatures;
foreach (Signature digitalSign in allSignatures)
{
signedTitle = digitalSign.Setup.SuggestedSignerLine2;
signedName = digitalSign.Details.SignatureText;
}
Is this what you need:
using (var xls = new ExcelPackage(fileInfo))
{
var name = xls.Workbook.Properties.Author;
var title = xls.Workbook.Properties.Title;
}
I don't see any other signatures (other than Zip file-related) in EPPlus:
https://github.com/JanKallman/EPPlus/search?q=signature&unscoped_q=signature
I have a C# data processing application which uses EPPlus to write the final results into an excel sheet. The background color of the rows are changed based on what the data on that row signifies. Time was never an issue as I only dealt with files that were below <100MB before. However, as my requirements have changed and the files get larger, I have noticed that.. just coloring makes my application 60% slower. Removing coloring makes the application significantly faster. The snippet below is an example of the code which I use to color the data to make it visually distinguishing. I'm no expert at EPPlus but is there a way, this can be optimized to make my application faster? Or are there any better ways for me to make the rows visually distinct for the people who will end up looking at the data? Any help will be appreciated!
if (data[4] == "3")
{
// color the type 3 messages here
var fill1 = cell1.Style.Fill;
fill1.PatternType = ExcelFillStyle.Solid;
fill1.BackgroundColor.SetColor(Color.LightGray);
}
if (data[4] == "4")
{
var fill1 = cell1.Style.Fill;
fill1.PatternType = ExcelFillStyle.Solid;
fill1.BackgroundColor.SetColor(Color.BlanchedAlmond);
}
EDIT:
This is the code I use to copy the template and write the excel data into the new worksheet. p is an Excel Package which I convert to a byte Array before writing to the excel file.
Byte[] bin = p.GetAsByteArray();
File.Copy("C:\\Users\\mpas\\Desktop\\template.xlsx", "C:\\Users\\mpas\\Desktop\\result.xlsx");
using (FileStream fs = File.OpenWrite("C:\\Users\\mpas\\Desktop\\result.xlsx")) {
fs.Write(bin, 0, bin.Length);
}
Styling is much faster in EPPlus, and most Excel APIs, if you use named styles. Assign and use the style to cell in EPPlus like this ...
internal static string YourStyleName = "MyStyle";
ExcelNamedStyleXml yourStyle = excel.Workbook.Styles.CreateNamedStyle(YourStyleName);
yourStyle.Style.Font.Color.SetColor(Color.DarkRed);
yourStyle.Style.Fill.PatternType = ExcelFillStyle.Solid;
yourStyle.Style.Fill.BackgroundColor.SetColor(Color.LemonChiffon);
// ...
sheet.Cells[sourceRange].StyleName = YourStyleStyleName
Here is code to open an existing file.
FileInfo AddressList = new FileInfo("c:\test\test.xlsx");
// Open and read the XlSX file.
try
{
using (ExcelPackage package = new ExcelPackage(AddressList))
{
// Get the work book in the file
ExcelWorkbook workBook = package.Workbook;
if (workBook != null)
{
if (workBook.Worksheets.Count > 0)
{
// Get the first worksheet
//ExcelWorksheet Worksheet = workBook.Worksheets.First();
var worksheet = package.Workbook.Worksheets[1];
I'm using EPPlus in a web application with C#. I need to read an Excel file and check its format, I tried it doing the same as this article (How do I get partial cell styling in excel using EPpplus?), and actually all format properties were ok (bold, italic, color...), but the one that I really need is to check the strike
text property and it is always set to false.
Here is an answer just so the question doesnt hang out there:
[TestMethod]
public void Strike_Format_Test()
{
//http://stackoverflow.com/questions/30517646/how-to-apply-strike-formatting-using-epplus
var existingFile = new FileInfo(#"c:\temp\StrikeFormat.xlsx");
using (var pck = new ExcelPackage(existingFile))
{
var wb = pck.Workbook;
var ws = wb.Worksheets.First();
var cell = ws.Cells["A1"];
Console.WriteLine(cell.Style.Font.Strike);
}
}