C# Gembox Spreadsheet - Read cell value which references to another excel file - c#

I am working with GemBox Spreadsheet now to read Excel file using C#. In the Excel file, there is cell contains Date value which refers to another excel file.
In the C#, at first, I got zero value. Based on the GemBox Retrieving Calculated Values From a Spreadsheet or Flexcel post, I replaced the Load method into XlsxOptions.PreserveMakeCopy. After replacing, I got some random value, like 43257, instead of Date value. I am not sure what value it is.
Here is the code snippet:
public void Export()
{
SpreadsheetInfo.SetLicense(FileExport.GemBoxLicenseKey);
ExcelFile ef = new ExcelFile();
FileExport file = new FileExport();
ef.LoadXlsx(locationPath + "\\XX.xlsx", GemBox.Spreadsheet.XlsxOptions.PreserveMakeCopy);
ExcelWorksheet ws = ef.Worksheets[0];
try
{
int trackrowcount = 0;
int columncounter = 1;
DateTime date = DateTime.MinValue;
foreach (ExcelRow row in ws.Rows)
{
trackrowcount += 1;
if (trackrowcount < 4)
continue;
columncounter = 1;
foreach (ExcelCell cell in row.AllocatedCells)
{
if (cell.Value != null)
{
if (trackrowcount == 4 && columncounter == 2)
{
var cellFormula = cell.Formula;
var cellValue = cell.Value;
//Here is where I am trying to get the date value
date = Convert.ToDateTime(cellValue);
}
}
columncounter += 1;
}
}
// and so on
}
catch (Exception ex)
{
}
}
Is there any setting that I missed?

You are getting date in OADate format in datatype double. You have to convert it back in DateTime Format.
double dateValue = 43257;
DateTime date = DateTime.FromOADate(dateValue);

Related

Importing Excel data into SQL Server using EPPlus

I am writing a C# code in .NET 6 to read data from Excel and save it into the SQL Server database.
When reading the column with date / date time value, I realized that if the cell contains a time portion, EPPlus will return the cell value as 9/1/2000 8:00:00 AM, but if the cell value only contains date, EPPlus will return the value as an integer, eg: 39580.
I would like to know if it is possible to standardize the return of both cell values to return in integer/double?
My code for reading the value from Excel:
for (int row = 2; row <= sheet.Dimension.Rows; row++)
{
var obj = Activator.CreateInstance(typeof (T)); //generic object
foreach(var prop in typeof (T).GetProperties())
{
if (!string.IsNullOrWhiteSpace(prop.Name))
{
var column = columnInfo.SingleOrDefault(c => !string.IsNullOrWhiteSpace(c.ColumnName) && c.ColumnName.Trim() == prop.Name.Trim());
if (column != null)
{
int col = column.Index;
var val = sheet.Cells[row, col].Value;
var propType = prop.PropertyType;
string ? strVal = "";
if (val != null)
{
strVal = val.ToString();
if (!string.IsNullOrWhiteSpace(strVal))
{
strVal = strVal.Trim();
}
}
prop.SetValue(obj, Convert.ChangeType(strVal, propType));
}
}
}
if (obj != null)
{
list.Add((T) obj);
}
}
Yes, using the EP Plus library and C#, it is possible to standardize the return of both cell values to an integer or a double. A DateTime object can be converted to an OLE Automation date, which is a double value that indicates the number of days from midnight on December 30, 1899, by using the DateTime.ToOADate method. Here's a better way:
using (var package = new ExcelPackage(file))
{
var worksheet = package.Workbook.Worksheets[1];
for (int row = 1; row <= worksheet.Dimension.End.Row; row++)
{
for (int col = 1; col <= worksheet.Dimension.End.Column; col++)
{
var cell = worksheet.Cells[row, col];
if (cell.Value != null)
{
if (cell.Value is DateTime)
{
var dateValue = (DateTime)cell.Value;
var doubleValue = dateValue.ToOADate();
}
else
{
var intValue = (int)cell.Value;
}
}
}
}
}
overall<<
DateTime, will be converted to a double using ToOADate,
and if the cell value is an integer, it will be cast to an int.

Excel treating date column as general but I need it to treat it as Date when exporting to Excel using C#

I am using Visual Studio and writing code in C# to read an excel then format the excel worksheet to what i want then export it back into an excel. After exporting to Excel, when I open that file, it treats the date column as "General" unless I right click on it and set its type to "Date". I need excel to read it as "Date." The date is formatted correctly but Excel not reading it as a Date. Any suggestions on how to fix this would be greatly appreciated. It's alot of code so I hope this is enough for someone to understand what I am trying to do.
// Excel rows are cell arrays so loop through once to determine
// the actual table geometry
int firstRow = xlSheet.FirstRowNum;
int lastRow = xlSheet.LastRowNum;
int firstCol = xlSheet.GetRow(firstRow).FirstCellNum;
int lastCol = xlSheet.GetRow(lastRow).LastCellNum;
for (int rowIdx = firstRow; rowIdx < lastRow; rowIdx++)
{
var r = xlSheet.GetRow(rowIdx);
if (r == null) { continue; } // Nothing here, move on to the next record
int thisFirstCol = xlSheet.GetRow(rowIdx).FirstCellNum;
int thisLastCol = xlSheet.GetRow(rowIdx).LastCellNum;
firstCol = Math.Min(thisFirstCol, firstCol);
lastCol = Math.Max(thisLastCol, lastCol);
}
// System.Diagnostics.Debug.WriteLine($"First Row/Col:{firstRow} / {firstCol} Last Row/Col {lastRow} / {lastCol}");
// Set up the column headers
for (int colIdx = firstCol; colIdx < lastCol; colIdx++)
{
string colName = $"field{colIdx}"; //default
if (useHeader)
{
try
{
// Only if firstRow isn't the header, otherwise you should choose useHeader = false
// when calling this from the adapter.
// Duplicate column name is rare, but it happens.
colName = xlSheet.GetRow(firstRow).GetCell(colIdx).ToString().Trim();
if (String.IsNullOrEmpty(colName) || dt.Columns.Contains(colName))
{
colName = $"field{colIdx}";
}
}
catch { } // May not have one, so the default will be used
}
dt.Columns.Add(colName, typeof(string));
}
// Now loop again to populate the data
DataRow dr = null;
if (useHeader) {
firstRow++;
lastRow++;
}
for (int rowIdx = firstRow; rowIdx < lastRow; rowIdx++)
{
var xlRow = xlSheet.GetRow(rowIdx);
if (xlRow == null) { continue; } // Nothing here, move on to the next record
dr = dt.NewRow();
// We have to account for a bizarre -1 first column index on some Excel files, Gordon is an example
// Datacolumns always start at 0, so we'll increment it independently, the count should be the same
// in theory...
int dtColIdx = -1;
for (int colIdx = firstCol; colIdx < lastCol; colIdx++)
{
dtColIdx++;
try
{
// Convert the value to a reasonable format
ICell cell = xlRow.GetCell(colIdx);
if (cell != null)
{
switch (cell.CellType)
{
case CellType.Formula:
dr[dtColIdx] = cell.NumericCellValue.ToString();
break;
case CellType.Numeric:
if (DateUtil.IsCellDateFormatted(cell))
{
// TODO: DateTime vs Date vs Time check
dr[dtColIdx] = cell.DateCellValue.ToString("MM/dd/yyyy");
// string xx = dr[dtColIdx].ToString();
// xlSheet.GetColumnStyle(dtColIdx).DataFormat = "mm/dd/yyyy";
}
else
{
dr[dtColIdx] = cell.NumericCellValue.ToString();
}
break;
default:
dr[dtColIdx] = cell.ToString();
break;
}
}
}
catch
{
// May be nothing at this address, plug it with a blank
dr[dtColIdx] = String.Empty;
}
}
dt.Rows.Add(dr);
}
dt.AcceptChanges();
//GenFieldParseCode(dt);
return dt;
}

Get Merged Cell Area with EPPLus

I'm using EPPlus to read excel files.
I have a single cell that is part of merged cells. How do I get the merged range that this cell is part of?
For example:
Assume Range ("A1:C1") has been merged.
Given Range "B1" it's Merge property will be true but there isn't a way to get the merged range given a single cell.
How do you get the merged range?
I was hoping for a .MergedRange which would return Range("A1:C1")
There is no such property out of the box but the worksheet has a MergedCells property with an array of all the merged cell addresses in the worksheet and a GetMergeCellId() method which will give you the index for a given cell address.
We can therefore combine these into a little extension method you can use to get the address. Something like this:
public static string GetMergedRangeAddress(this ExcelRange #this)
{
if (#this.Merge)
{
var idx = #this.Worksheet.GetMergeCellId(#this.Start.Row, #this.Start.Column);
return #this.Worksheet.MergedCells[idx-1]; //the array is 0-indexed but the mergeId is 1-indexed...
}
else
{
return #this.Address;
}
}
which you can use as follows:
using (var excel = new ExcelPackage(new FileInfo("inputFile.xlsx")))
{
var ws = excel.Workbook.Worksheets["sheet1"];
var b3address = ws.Cells["B3"].GetMergedRangeAddress();
}
(Note that in the event that you use this method on a multi-celled range it will return the merged cell address for the first cell in the range only)
You can get all merged cells from worksheet, hence
you can find the merged range a specific cell belongs to using the following:
public string GetMergedRange(ExcelWorksheet worksheet, string cellAddress)
{
ExcelWorksheet.MergeCellsCollection mergedCells = worksheet.MergedCells;
foreach (var merged in mergedCells)
{
ExcelRange range = worksheet.Cells[merged];
ExcelCellAddress cell = new ExcelCellAddress(cellAddress);
if (range.Start.Row<=cell.Row && range.Start.Column <= cell.Column)
{
if (range.End.Row >= cell.Row && range.End.Column >= cell.Column)
{
return merged.ToString();
}
}
}
return "";
}
Update:
Turns out that there is a much easier way using EPPLUS, just do the following:
var mergedadress = worksheet.MergedCells[row, column];
For example, if B1 is in a merged range "A1:C1":
var mergedadress = worksheet.MergedCells[1, 2]; //value of mergedadress will be "A1:C1".
2 is the column number because B is the 2nd column.
This will provide you exact width of merged cells:
workSheet.Cells[workSheet.MergedCells[row, col]].Columns
Not a direct answer as Stewart's answer is perfect, but I was lead here looking for a way to get the value of a cell, whether it's part of a larger merged cell or not, so I improved on Stewart's code:
public static string GetVal(this ExcelRange #this)
{
if (#this.Merge)
{
var idx = #this.Worksheet.GetMergeCellId(#this.Start.Row, #this.Start.Column);
string mergedCellAddress = #this.Worksheet.MergedCells[idx - 1];
string firstCellAddress = #this.Worksheet.Cells[mergedCellAddress].Start.Address;
return #this.Worksheet.Cells[firstCellAddress].Value?.ToString()?.Trim() ?? "";
}
else
{
return #this.Value?.ToString()?.Trim() ?? "";
}
}
And call it like this
var worksheet = package.Workbook.Worksheets[i];
var rowCount = worksheet.Dimension.Rows;
var columnCount = worksheet.Dimension.Columns;
for (int row = 1; row <= rowCount; row++)
{
for (int col = 1; col <= columnCount; col++)
{
string val = worksheet.Cells[row, col].GetVal();
}
}

Reading excel file in c# using Microsoft DocumentFormat.OpenXml SDK

I am using Microsoft DocumentFormat.OpenXml SDK to read data from excel file.
While doing so I am taking into consideration if a cell has blank values(If Yes, read that too).
Now, facing issues with one of the excel sheets where the workSheet.SheetDimension is null hence the code is throwing an exception.
Code used :
class OpenXMLHelper
{
// A helper function to open an Excel file using OpenXML, and return a DataTable containing all the data from one
// of the worksheets.
//
// We've had lots of problems reading in Excel data using OLEDB (eg the ACE drivers no longer being present on new servers,
// OLEDB not working due to security issues, and blatantly ignoring blank rows at the top of worksheets), so this is a more
// stable method of reading in the data.
//
public static DataTable ExcelWorksheetToDataTable(string pathFilename)
{
try
{
DataTable dt = new DataTable();
string dimensions = string.Empty;
using (FileStream fs = new FileStream(pathFilename, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (SpreadsheetDocument document = SpreadsheetDocument.Open(fs, false))
{
// Find the sheet with the supplied name, and then use that
// Sheet object to retrieve a reference to the first worksheet.
//Sheet theSheet = document.WorkbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == worksheetName).FirstOrDefault();
//--Sheet theSheet = document.WorkbookPart.Workbook.Descendants<Sheet>().FirstOrDefault();
//--if (theSheet == null)
//-- throw new Exception("Couldn't find the worksheet: "+ theSheet.Id);
// Retrieve a reference to the worksheet part.
//WorksheetPart wsPart = (WorksheetPart)(document.WorkbookPart.GetPartById(theSheet.Id));
//--WorksheetPart wsPart = (WorksheetPart)(document.WorkbookPart.GetPartById(theSheet.Id));
WorkbookPart workbookPart = document.WorkbookPart;
WorksheetPart wsPart = workbookPart.WorksheetParts.FirstOrDefault();
Worksheet workSheet = wsPart.Worksheet;
dimensions = workSheet.SheetDimension.Reference.InnerText; // Get the dimensions of this worksheet, eg "B2:F4"
int numOfColumns = 0;
int numOfRows = 0;
CalculateDataTableSize(dimensions, ref numOfColumns, ref numOfRows);
//System.Diagnostics.Trace.WriteLine(string.Format("The worksheet \"{0}\" has dimensions \"{1}\", so we need a DataTable of size {2}x{3}.", worksheetName, dimensions, numOfColumns, numOfRows));
SheetData sheetData = workSheet.GetFirstChild<SheetData>();
IEnumerable<Row> rows = sheetData.Descendants<Row>();
string[,] cellValues = new string[numOfColumns, numOfRows];
int colInx = 0;
int rowInx = 0;
string value = "";
SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
// Iterate through each row of OpenXML data, and store each cell's value in the appropriate slot in our [,] string array.
foreach (Row row in rows)
{
for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
// *DON'T* assume there's going to be one XML element for each column in each row...
Cell cell = row.Descendants<Cell>().ElementAt(i);
if (cell.CellValue == null || cell.CellReference == null)
continue; // eg when an Excel cell contains a blank string
// Convert this Excel cell's CellAddress into a 0-based offset into our array (eg "G13" -> [6, 12])
colInx = GetColumnIndexByName(cell.CellReference); // eg "C" -> 2 (0-based)
rowInx = GetRowIndexFromCellAddress(cell.CellReference) - 1; // Needs to be 0-based
// Fetch the value in this cell
value = cell.CellValue.InnerXml;
if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
{
value = stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
}
cellValues[colInx, rowInx] = value;
}
}
// Copy the array of strings into a DataTable.
// We don't (currently) make any attempt to work out which columns should be numeric, rather than string.
for (int col = 0; col < numOfColumns; col++)
{
//dt.Columns.Add("Column_" + col.ToString());
dt.Columns.Add(cellValues[col, 0]);
}
//foreach (Cell cell in rows.ElementAt(0))
//{
// dt.Columns.Add(GetCellValue(doc, cell));
//}
for (int row = 0; row < numOfRows; row++)
{
DataRow dataRow = dt.NewRow();
for (int col = 0; col < numOfColumns; col++)
{
dataRow.SetField(col, cellValues[col, row]);
}
dt.Rows.Add(dataRow);
}
dt.Rows.RemoveAt(0);
//#if DEBUG
// // Write out the contents of our DataTable to the Output window (for debugging)
// string str = "";
// for (rowInx = 0; rowInx < maxNumOfRows; rowInx++)
// {
// for (colInx = 0; colInx < maxNumOfColumns; colInx++)
// {
// object val = dt.Rows[rowInx].ItemArray[colInx];
// str += (val == null) ? "" : val.ToString();
// str += "\t";
// }
// str += "\n";
// }
// System.Diagnostics.Trace.WriteLine(str);
//#endif
return dt;
}
}
}
catch (Exception ex)
{
return null;
}
}
public static void CalculateDataTableSize(string dimensions, ref int numOfColumns, ref int numOfRows)
{
// How many columns & rows of data does this Worksheet contain ?
// We'll read in the Dimensions string from the Excel file, and calculate the size based on that.
// eg "B1:F4" -> we'll need 6 columns and 4 rows.
//
// (We deliberately ignore the top-left cell address, and just use the bottom-right cell address.)
try
{
string[] parts = dimensions.Split(':'); // eg "B1:F4"
if (parts.Length != 2)
throw new Exception("Couldn't find exactly *two* CellAddresses in the dimension");
numOfColumns = 1 + GetColumnIndexByName(parts[1]); // A=1, B=2, C=3 (1-based value), so F4 would return 6 columns
numOfRows = GetRowIndexFromCellAddress(parts[1]);
}
catch
{
throw new Exception("Could not calculate maximum DataTable size from the worksheet dimension: " + dimensions);
}
}
public static int GetRowIndexFromCellAddress(string cellAddress)
{
// Convert an Excel CellReference column into a 1-based row index
// eg "D42" -> 42
// "F123" -> 123
string rowNumber = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^0-9 _]", "");
return int.Parse(rowNumber);
}
public static int GetColumnIndexByName(string cellAddress)
{
// Convert an Excel CellReference column into a 0-based column index
// eg "D42" -> 3
// "F123" -> 5
var columnName = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^A-Z_]", "");
int number = 0, pow = 1;
for (int i = columnName.Length - 1; i >= 0; i--)
{
number += (columnName[i] - 'A' + 1) * pow;
pow *= 26;
}
return number - 1;
}
}[enter image description here][1]
The SheetDimension part is optional (and therefor you cannot always rely on it being up to date). See the following part of the OpenXML specification:
18.3.1.35 dimension (Worksheet Dimensions)
This element specifies the used range of the worksheet. It specifies the row and column bounds of
used cells in the worksheet. This is optional and is not required.
Used cells include cells with formulas, text content, and cell
formatting. When an entire column is formatted, only the first cell in
that column is considered used.
So an Excel file without any SheetDimension part is perfectly valid, so you should not rely on it being present in an Excel file.
Therefor I'd suggest to simply parse all Row elements contained in the SheetData part, and "count" the number of rows (instead of reading the SheetDimensions part to get the number of rows / columns). This way you can also take into account that an Excel file may contain completely blank rows in-between the data.

Having trouble reading excel file with the OpenXML sdk

I have a function that reads from an excel file and stores the results in a DataSet. I have another function that writes to an excel file. When I try to read from a regular human-generated excel file, the excel reading function returns a blank DataSet, but when I read from the excel file generated by the writing function, it works perfectly fine. The function then will not work on a regular generated excel file, even when I just copy and paste the contents of the function generated excel file. I finally tracked it down to this, but I have no idea where to go from here. Is there something wrong with my code?
Here is the excel generating function:
public static Boolean writeToExcel(string fileName, DataSet data)
{
Boolean answer = false;
using (SpreadsheetDocument excelDoc = SpreadsheetDocument.Create(tempPath + fileName, SpreadsheetDocumentType.Workbook))
{
WorkbookPart workbookPart = excelDoc.AddWorkbookPart();
workbookPart.Workbook = new Workbook();
WorksheetPart worksheetPart = workbookPart.AddNewPart<WorksheetPart>();
Sheets sheets = excelDoc.WorkbookPart.Workbook.AppendChild<Sheets>(new Sheets());
Sheet sheet = new Sheet()
{
Id = excelDoc.WorkbookPart.GetIdOfPart(worksheetPart),
SheetId = 1,
Name = "Page1"
};
sheets.Append(sheet);
CreateWorkSheet(worksheetPart, data);
answer = true;
}
return answer;
}
private static void CreateWorkSheet(WorksheetPart worksheetPart, DataSet data)
{
Worksheet worksheet = new Worksheet();
SheetData sheetData = new SheetData();
UInt32Value currRowIndex = 1U;
int colIndex = 0;
Row excelRow;
DataTable table = data.Tables[0];
for (int rowIndex = -1; rowIndex < table.Rows.Count; rowIndex++)
{
excelRow = new Row();
excelRow.RowIndex = currRowIndex++;
for (colIndex = 0; colIndex < table.Columns.Count; colIndex++)
{
Cell cell = new Cell()
{
CellReference = Convert.ToString(Convert.ToChar(65 + colIndex)),
DataType = CellValues.String
};
CellValue cellValue = new CellValue();
if (rowIndex == -1)
{
cellValue.Text = table.Columns[colIndex].ColumnName.ToString();
}
else
{
cellValue.Text = (table.Rows[rowIndex].ItemArray[colIndex].ToString() != "") ? table.Rows[rowIndex].ItemArray[colIndex].ToString() : "*";
}
cell.Append(cellValue);
excelRow.Append(cell);
}
sheetData.Append(excelRow);
}
SheetFormatProperties formattingProps = new SheetFormatProperties()
{
DefaultColumnWidth = 20D,
DefaultRowHeight = 20D
};
worksheet.Append(formattingProps);
worksheet.Append(sheetData);
worksheetPart.Worksheet = worksheet;
}
while the reading function is as following:
public static void readInventoryExcel(string fileName, ref DataSet set)
{
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(fileName, false))
{
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
int count = -1;
foreach (Row r in sheetData.Elements<Row>())
{
if (count >= 0)
{
DataRow row = set.Tables[0].NewRow();
row["SerialNumber"] = r.ChildElements[1].InnerXml;
row["PartNumber"] = r.ChildElements[2].InnerXml;
row["EntryDate"] = r.ChildElements[3].InnerXml;
row["RetirementDate"] = r.ChildElements[4].InnerXml;
row["ReasonForReplacement"] = r.ChildElements[5].InnerXml;
row["RetirementTech"] = r.ChildElements[6].InnerXml;
row["IncludeInMaintenance"] = r.ChildElements[7].InnerXml;
row["MaintenanceTech"] = r.ChildElements[8].InnerXml;
row["Comment"] = r.ChildElements[9].InnerXml;
row["Station"] = r.ChildElements[10].InnerXml;
row["LocationStatus"] = r.ChildElements[11].InnerXml;
row["AssetName"] = r.ChildElements[12].InnerXml;
row["InventoryType"] = r.ChildElements[13].InnerXml;
row["Description"] = r.ChildElements[14].InnerXml;
set.Tables[0].Rows.Add(row);
}
count++;
}
}
}
I think this is caused by the fact that you have only one sheet whereas Excel has three. I'm not certain but I think the sheets are returned in reverse order so you should change the line:
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
to
WorksheetPart worksheetPart = workbookPart.WorksheetParts.Last();
It might be safer to search for the WorksheetPart if you can identify it by the sheet name. You need to find the Sheet first then use the Id of that to find the SheetPart:
private WorksheetPart GetWorksheetPartBySheetName(WorkbookPart workbookPart, string sheetName)
{
//find the sheet first.
IEnumerable<Sheet> sheets = workbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>().Where(s => s.Name == sheetName);
if (sheets.Count() > 0)
{
string relationshipId = sheets.First().Id.Value;
WorksheetPart worksheetPart = (WorksheetPart)workbookPart.GetPartById(relationshipId);
return worksheetPart;
}
return null;
}
You can then use:
WorksheetPart worksheetPart = GetWorksheetPartBySheetName(workbookPart, "Sheet1");
There are a couple of other things I've noticed whilst looking at your code which you may (or may not!) be interested in:
In your code you are only reading the InnerXml so it might not matter to you but the way Excel stores strings is different to the way you are writing them so reading an Excel generated file may not give you the values you expect. In your example you are writing the string directly to the cell like this:
But Excel uses a SharedStrings concept where all strings are written to a separate XML file called sharedStrings.xml. That file contains the strings used in the Excel file with a reference and it's that value that is stored in the cell value in the sheet XML.
The sharedString.xml looks like this:
And the Cell then looks like this:
The 47 in the <v> element is a reference to the 47th shared string. Note that the type (the t attribute) in your generated XML is str but the type in the Excel generated file is s. This denotes yours is an inline string and theirs is a shared string.
You can read the SharedStrings just as you would any other part:
var stringTable = workbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
if (stringTable != null)
{
sharedString = stringTable.SharedStringTable.ElementAt(int.Parse(value)).InnerText;
}
Secondly, if you look at the cell reference that your code generates and the cell reference that Excel generates you can see you are only outputting the column and not the row (e.g. you output A instead of A1). To fix this you should change the line:
CellReference = Convert.ToString(Convert.ToChar(65 + colIndex)),
to
CellReference = Convert.ToString(Convert.ToChar(65 + colIndex) + rowIndex.ToString()),
I hope that helps.
I ran into a similar issue a while back trying to do this for Word documents (procedurally generated worked fine, but human-generated did not). I found this tool to be very helpful:
http://www.microsoft.com/en-us/download/details.aspx?id=30425
Basically, it looks at a file and shows you the code that Microsoft would generate to read it, as well as the xml structure of the file itself. As usual for Microsoft products, there are quite a few menus and it's not very intuitive, but after clicking around for a bit you will be able to see exactly what is going on with any two files. I would recommend you open a working excel file and a non-working one and compare the difference to see what's causing your issue.
Below is the OpenXML code that I use to read in a particular Worksheet from an Excel file, into a DataTable.
First, here's how you'd call it:
DataTable dt = OpenXMLHelper.ExcelWorksheetToDataTable("C:\\SQL Server\\SomeExcelFile.xlsx", "Mikes Worksheet");
And here's the code:
public class OpenXMLHelper
{
public static DataTable ExcelWorksheetToDataTable(string pathFilename, string worksheetName)
{
DataTable dt = new DataTable(worksheetName);
using (SpreadsheetDocument document = SpreadsheetDocument.Open(pathFilename, false))
{
// Find the sheet with the supplied name, and then use that
// Sheet object to retrieve a reference to the first worksheet.
Sheet theSheet = document.WorkbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == worksheetName).FirstOrDefault();
if (theSheet == null)
throw new Exception("Couldn't find the worksheet: " + worksheetName);
// Retrieve a reference to the worksheet part.
WorksheetPart wsPart = (WorksheetPart)(document.WorkbookPart.GetPartById(theSheet.Id));
Worksheet workSheet = wsPart.Worksheet;
string dimensions = workSheet.SheetDimension.Reference.InnerText; // Get the dimensions of this worksheet, eg "B2:F4"
int numOfColumns = 0;
int numOfRows = 0;
CalculateDataTableSize(dimensions, ref numOfColumns, ref numOfRows);
System.Diagnostics.Trace.WriteLine(string.Format("The worksheet \"{0}\" has dimensions \"{1}\", so we need a DataTable of size {2}x{3}.", worksheetName, dimensions, numOfColumns, numOfRows));
SheetData sheetData = workSheet.GetFirstChild<SheetData>();
IEnumerable<Row> rows = sheetData.Descendants<Row>();
string[,] cellValues = new string[numOfColumns, numOfRows];
int colInx = 0;
int rowInx = 0;
string value = "";
SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
// Iterate through each row of OpenXML data
foreach (Row row in rows)
{
for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
// *DON'T* assume there's going to be one XML element for each item in each row...
Cell cell = row.Descendants<Cell>().ElementAt(i);
if (cell.CellValue == null || cell.CellReference == null)
continue; // eg when an Excel cell contains a blank string
// Convert this Excel cell's CellAddress into a 0-based offset into our array (eg "G13" -> [6, 12])
colInx = GetColumnIndexByName(cell.CellReference); // eg "C" -> 2 (0-based)
rowInx = GetRowIndexFromCellAddress(cell.CellReference)-1; // Needs to be 0-based
// Fetch the value in this cell
value = cell.CellValue.InnerXml;
if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
{
value = stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
}
cellValues[colInx, rowInx] = value;
}
dt.Rows.Add(dataRow);
}
// Copy the array of strings into a DataTable
for (int col = 0; col < numOfColumns; col++)
dt.Columns.Add("Column_" + col.ToString());
for (int row = 0; row < numOfRows; row++)
{
DataRow dataRow = dt.NewRow();
for (int col = 0; col < numOfColumns; col++)
{
dataRow.SetField(col, cellValues[col, row]);
}
dt.Rows.Add(dataRow);
}
#if DEBUG
// Write out the contents of our DataTable to the Output window (for debugging)
string str = "";
for (rowInx = 0; rowInx < maxNumOfRows; rowInx++)
{
for (colInx = 0; colInx < maxNumOfColumns; colInx++)
{
object val = dt.Rows[rowInx].ItemArray[colInx];
str += (val == null) ? "" : val.ToString();
str += "\t";
}
str += "\n";
}
System.Diagnostics.Trace.WriteLine(str);
#endif
return dt;
}
}
private static void CalculateDataTableSize(string dimensions, ref int numOfColumns, ref int numOfRows)
{
// How many columns & rows of data does this Worksheet contain ?
// We'll read in the Dimensions string from the Excel file, and calculate the size based on that.
// eg "B1:F4" -> we'll need 6 columns and 4 rows.
//
// (We deliberately ignore the top-left cell address, and just use the bottom-right cell address.)
try
{
string[] parts = dimensions.Split(':'); // eg "B1:F4"
if (parts.Length != 2)
throw new Exception("Couldn't find exactly *two* CellAddresses in the dimension");
numOfColumns = 1 + GetColumnIndexByName(parts[1]); // A=1, B=2, C=3 (1-based value), so F4 would return 6 columns
numOfRows = GetRowIndexFromCellAddress(parts[1]);
}
catch
{
throw new Exception("Could not calculate maximum DataTable size from the worksheet dimension: " + dimensions);
}
}
public static int GetRowIndexFromCellAddress(string cellAddress)
{
// Convert an Excel CellReference column into a 1-based row index
// eg "D42" -> 42
// "F123" -> 123
string rowNumber = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^0-9 _]", "");
return int.Parse(rowNumber);
}
public static int GetColumnIndexByName(string cellAddress)
{
// Convert an Excel CellReference column into a 0-based column index
// eg "D42" -> 3
// "F123" -> 5
var columnName = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^A-Z_]", "");
int number = 0, pow = 1;
for (int i = columnName.Length - 1; i >= 0; i--)
{
number += (columnName[i] - 'A' + 1) * pow;
pow *= 26;
}
return number - 1;
}
}
Just to mention, some of our company's Excel Worksheets have one or more blank rows at the top. Strangely, this prevented some other OpenXML libraries from reading in such Worksheets properly.
This code deliberately creates a DataTable with one value for each of the cells in the Worksheet, even the blank ones at the top.

Categories

Resources