OpenXML how to get cell in range

OpenXML how to get cell in range - c#

Please help me to get cell in range (ex from A:1 to E:11 are all cells in rectangular).
For now, my ideal is
Worksheet worksheet = GetWorksheet(document, sheetName);
SheetData sheetData = worksheet.GetFirstChild<SheetData>();
IEnumerable<Cell> cells = sheetData.Descendants<Cell>().Where(c =>
c.CellReference >= A:1 &&
c.CellReference <= E:11 &&
);
int t = cells.Count();
But this code does not work.
Thanks

It won't be that easy to compare cell's CellReference with a string. And yes, what you are currently doing is wrong. You simply cannot compare strings for Higher or Lower in such a way.
You have two options.
Option 1 :
You can take cell reference and break it down. That means separate characters and numbers and then give them values individually and compare
A1 - > A and 1 -> Give A =1 so you have 1 and 1
E11 -> E and 11 -> Give E = 5 so you have 5 and 11
So you will need to breakdown the CellReference and check the validity for your requirement.
Option 2 :
If you notice above it's simply we take a 2D matrix index (ex : 1,1 and 5,11 which are COLUMN,ROW format). You can simply use this feature in comparison. But catch is you cannot use LINQ for this, you need to iterate through rows and columns. I tried to give following example code, try it
using (SpreadsheetDocument myDoc = SpreadsheetDocument.Open("PATH", true))
{
//Get workbookpart
WorkbookPart workbookPart = myDoc.WorkbookPart;
// Extract the workbook part
var stringtable = workbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
//then access to the worksheet part
IEnumerable<WorksheetPart> worksheetPart = workbookPart.WorksheetParts;
foreach (WorksheetPart WSP in worksheetPart)
{
//find sheet data
IEnumerable<SheetData> sheetData = WSP.Worksheet.Elements<SheetData>();
int RowCount = 0;
int CellCount = 0;
// This is A1
int RowMin = 1;
int ColMin = 1;
//This is E11
int RowMax = 11;
int ColMax = 5;
foreach (SheetData SD in sheetData)
{
foreach (Row row in SD.Elements<Row>())
{
RowCount++; // We are in a new row
// For each cell we need to identify type
foreach (Cell cell in row.Elements<Cell>())
{
// We are in a new Cell
CellCount++;
if ((RowCount >= RowMin && CellCount >= ColMin) && (RowCount <= RowMax && CellCount <= ColMax))
{
if (cell.DataType == null && cell.CellValue != null)
{
// Check for pure numbers
Console.WriteLine(cell.CellValue.Text);
}
else if (cell.DataType.Value == CellValues.Boolean)
{
// Booleans
Console.WriteLine(cell.CellValue.Text);
}
else if (cell.CellValue != null)
{
// A shared string
if (stringtable != null)
{
// Cell value holds the shared string location
Console.WriteLine(stringtable.SharedStringTable.ElementAt(int.Parse(cell.CellValue.Text)).InnerText);
}
}
else
{
Console.WriteLine("A broken book");
}
}
}
// Reset Cell count
CellCount = 0;
}
}
}
}
This actually work. I tested.

Related

c#: Is there a way to retrieve the cell address in excel from where data begins?

I'm trying to copy excel data from one sheet to another. Its working fine but the problem is: In the source file if the data doesn't starts from cell A1 (consider the image below), in this case I want to copy data from the cell B5. Here Some header is not required. The actual data starts from Emp ID cell.
What I've tried is, I can provide a textbox to input the cell address into it and than start copying the data from the provided cell address. But this introduces manual intervention. I want it automated. Any help on this is appreciated. Thanks for the help.

Assuming some basic criteria, the following code should do it. The criteria I assume is: 1) if a row contains any merged cells (like your "Some Header") then that isn't the start row. 2) the start cell will contain text in the cell to the right and in the cell below it.
private static bool RowIsEmpty(Range range)
{
foreach (object obj in (object[,])range.Value2)
{
if (obj != null && obj.ToString() != "")
{
return false;
}
}
return true;
}
private static bool CellIsEmpty(Range cell)
{
if (cell.Value2 != null && cell.Value2.ToString() != "")
{
return false;
}
return true;
}
private Tuple<int, int> ExcelFindStartCell()
{
var excelApp = new Microsoft.Office.Interop.Excel.Application();
excelApp.Visible = true;
Workbook workbook = excelApp.Workbooks.Open("test.xlsx");
Worksheet worksheet = excelApp.ActiveSheet;
// Go through each row.
for (int row = 1; row < worksheet.Rows.Count; row++)
{
Range range = worksheet.Rows[row];
// Check if the row is empty.
if (RowIsEmpty(range))
{
continue;
}
// Check if the row contains any merged cells, if so we'll assume it's
// some kind of header and move on.
object mergedCells = range.MergeCells;
if (mergedCells == DBNull.Value || (bool)mergedCells)
{
continue;
}
// Find the first column that contains text in this row.
for (int col = 1; col < range.Columns.Count; col++)
{
Range cell = range.Cells[1, col];
if (CellIsEmpty(cell))
{
continue;
}
// Now check if the cell to the right also contains text.
Range rightCell = worksheet.Cells[row, col + 1];
if (CellIsEmpty(rightCell))
{
// No text in right cell, try the next row.
break;
}
// Now check if cell below also contains text.
Range bottomCell = worksheet.Cells[row + 1, col];
if (CellIsEmpty(bottomCell))
{
// No text in bottom cell, try the next row.
break;
}
// Success!
workbook.Close();
excelApp.Quit();
return new Tuple<int, int>(row, col);
}
}
// Didn't find anything that matched the criteria.
workbook.Close();
excelApp.Quit();
return null;
}

Reading excel file in c# using Microsoft DocumentFormat.OpenXml SDK

I am using Microsoft DocumentFormat.OpenXml SDK to read data from excel file.
While doing so I am taking into consideration if a cell has blank values(If Yes, read that too).
Now, facing issues with one of the excel sheets where the workSheet.SheetDimension is null hence the code is throwing an exception.
Code used :
class OpenXMLHelper
{
// A helper function to open an Excel file using OpenXML, and return a DataTable containing all the data from one
// of the worksheets.
//
// We've had lots of problems reading in Excel data using OLEDB (eg the ACE drivers no longer being present on new servers,
// OLEDB not working due to security issues, and blatantly ignoring blank rows at the top of worksheets), so this is a more
// stable method of reading in the data.
//
public static DataTable ExcelWorksheetToDataTable(string pathFilename)
{
try
{
DataTable dt = new DataTable();
string dimensions = string.Empty;
using (FileStream fs = new FileStream(pathFilename, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (SpreadsheetDocument document = SpreadsheetDocument.Open(fs, false))
{
// Find the sheet with the supplied name, and then use that
// Sheet object to retrieve a reference to the first worksheet.
//Sheet theSheet = document.WorkbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == worksheetName).FirstOrDefault();
//--Sheet theSheet = document.WorkbookPart.Workbook.Descendants<Sheet>().FirstOrDefault();
//--if (theSheet == null)
//-- throw new Exception("Couldn't find the worksheet: "+ theSheet.Id);
// Retrieve a reference to the worksheet part.
//WorksheetPart wsPart = (WorksheetPart)(document.WorkbookPart.GetPartById(theSheet.Id));
//--WorksheetPart wsPart = (WorksheetPart)(document.WorkbookPart.GetPartById(theSheet.Id));
WorkbookPart workbookPart = document.WorkbookPart;
WorksheetPart wsPart = workbookPart.WorksheetParts.FirstOrDefault();
Worksheet workSheet = wsPart.Worksheet;
dimensions = workSheet.SheetDimension.Reference.InnerText; // Get the dimensions of this worksheet, eg "B2:F4"
int numOfColumns = 0;
int numOfRows = 0;
CalculateDataTableSize(dimensions, ref numOfColumns, ref numOfRows);
//System.Diagnostics.Trace.WriteLine(string.Format("The worksheet \"{0}\" has dimensions \"{1}\", so we need a DataTable of size {2}x{3}.", worksheetName, dimensions, numOfColumns, numOfRows));
SheetData sheetData = workSheet.GetFirstChild<SheetData>();
IEnumerable<Row> rows = sheetData.Descendants<Row>();
string[,] cellValues = new string[numOfColumns, numOfRows];
int colInx = 0;
int rowInx = 0;
string value = "";
SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
// Iterate through each row of OpenXML data, and store each cell's value in the appropriate slot in our [,] string array.
foreach (Row row in rows)
{
for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
// *DON'T* assume there's going to be one XML element for each column in each row...
Cell cell = row.Descendants<Cell>().ElementAt(i);
if (cell.CellValue == null || cell.CellReference == null)
continue; // eg when an Excel cell contains a blank string
// Convert this Excel cell's CellAddress into a 0-based offset into our array (eg "G13" -> [6, 12])
colInx = GetColumnIndexByName(cell.CellReference); // eg "C" -> 2 (0-based)
rowInx = GetRowIndexFromCellAddress(cell.CellReference) - 1; // Needs to be 0-based
// Fetch the value in this cell
value = cell.CellValue.InnerXml;
if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
{
value = stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
}
cellValues[colInx, rowInx] = value;
}
}
// Copy the array of strings into a DataTable.
// We don't (currently) make any attempt to work out which columns should be numeric, rather than string.
for (int col = 0; col < numOfColumns; col++)
{
//dt.Columns.Add("Column_" + col.ToString());
dt.Columns.Add(cellValues[col, 0]);
}
//foreach (Cell cell in rows.ElementAt(0))
//{
// dt.Columns.Add(GetCellValue(doc, cell));
//}
for (int row = 0; row < numOfRows; row++)
{
DataRow dataRow = dt.NewRow();
for (int col = 0; col < numOfColumns; col++)
{
dataRow.SetField(col, cellValues[col, row]);
}
dt.Rows.Add(dataRow);
}
dt.Rows.RemoveAt(0);
//#if DEBUG
// // Write out the contents of our DataTable to the Output window (for debugging)
// string str = "";
// for (rowInx = 0; rowInx < maxNumOfRows; rowInx++)
// {
// for (colInx = 0; colInx < maxNumOfColumns; colInx++)
// {
// object val = dt.Rows[rowInx].ItemArray[colInx];
// str += (val == null) ? "" : val.ToString();
// str += "\t";
// }
// str += "\n";
// }
// System.Diagnostics.Trace.WriteLine(str);
//#endif
return dt;
}
}
}
catch (Exception ex)
{
return null;
}
}
public static void CalculateDataTableSize(string dimensions, ref int numOfColumns, ref int numOfRows)
{
// How many columns & rows of data does this Worksheet contain ?
// We'll read in the Dimensions string from the Excel file, and calculate the size based on that.
// eg "B1:F4" -> we'll need 6 columns and 4 rows.
//
// (We deliberately ignore the top-left cell address, and just use the bottom-right cell address.)
try
{
string[] parts = dimensions.Split(':'); // eg "B1:F4"
if (parts.Length != 2)
throw new Exception("Couldn't find exactly *two* CellAddresses in the dimension");
numOfColumns = 1 + GetColumnIndexByName(parts[1]); // A=1, B=2, C=3 (1-based value), so F4 would return 6 columns
numOfRows = GetRowIndexFromCellAddress(parts[1]);
}
catch
{
throw new Exception("Could not calculate maximum DataTable size from the worksheet dimension: " + dimensions);
}
}
public static int GetRowIndexFromCellAddress(string cellAddress)
{
// Convert an Excel CellReference column into a 1-based row index
// eg "D42" -> 42
// "F123" -> 123
string rowNumber = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^0-9 _]", "");
return int.Parse(rowNumber);
}
public static int GetColumnIndexByName(string cellAddress)
{
// Convert an Excel CellReference column into a 0-based column index
// eg "D42" -> 3
// "F123" -> 5
var columnName = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^A-Z_]", "");
int number = 0, pow = 1;
for (int i = columnName.Length - 1; i >= 0; i--)
{
number += (columnName[i] - 'A' + 1) * pow;
pow *= 26;
}
return number - 1;
}
}[enter image description here][1]

The SheetDimension part is optional (and therefor you cannot always rely on it being up to date). See the following part of the OpenXML specification:
18.3.1.35 dimension (Worksheet Dimensions)
This element specifies the used range of the worksheet. It specifies the row and column bounds of
used cells in the worksheet. This is optional and is not required.
Used cells include cells with formulas, text content, and cell
formatting. When an entire column is formatted, only the first cell in
that column is considered used.
So an Excel file without any SheetDimension part is perfectly valid, so you should not rely on it being present in an Excel file.
Therefor I'd suggest to simply parse all Row elements contained in the SheetData part, and "count" the number of rows (instead of reading the SheetDimensions part to get the number of rows / columns). This way you can also take into account that an Excel file may contain completely blank rows in-between the data.

How to read data fast from an excel and convert it to list from file stream

I am using EPPlus.
The excel I am uploading has column headers in row number 2 . And from row 4 onward it has the data which may vary up to 2k records.
The way I am doing it , it takes a lot of time for reading 2k records and putting to a list .
using (var excel = new ExcelPackage(hpf.InputStream))
{
var ws = excel.Workbook.Worksheets["Sheet1"];
//Read the file into memory
for (int rw = 4; rw <= ws.Dimension.End.Row; rw++)
{
if (!ws.Cells[rw, 1, rw, 24].All(c => c.Value == null))
{
int headerRow = 2;
GroupMembershipUploadInput gm = new GroupMembershipUploadInput();
for (int col = ws.Dimension.Start.Column; col <= ws.Dimension.End.Column; col++)
{
var s = ws.Cells[rw, col].Value;
if (ws.Cells[headerRow, col].Value.ToString().Equals("Existing Constituent Master Id"))
{
gm.cnst_mstr_id = (ws.Cells[rw, col].Value ?? (Object)"").ToString();
}
else if (ws.Cells[headerRow, col].Value.ToString().Equals("Prefix of the constituent(Mr, Mrs etc)"))
{
gm.cnst_prefix_nm = (ws.Cells[rw, col].Value ?? (Object)"").ToString();
}
}
lgl.GroupMembershipUploadInputList.Add(gm);
}
}
GroupMembershipUploadInputList is the list of objects of type GroupMembershipUploadInput that I am adding the excel values to after reading from the cell wise.
Can it be done faster ? What am I missing here ?
Please help to improve the performance.

You are making a lot iterations there, for each row, you visit each column twice. I assume that you only need those two values per row and if so the following code would reduce time drastically:
using (var excel = new ExcelPackage(hpf.InputStream))
{
var ws = excel.Workbook.Worksheets["Sheet1"];
int headerRow = 2;
// hold the colum index based on the value in the header
int col_cnst_mstr_id = 2;
int col_cnst_prefix_nm = 4;
// loop once over the columns to fetch the column index
for (int col = ws.Dimension.Start.Column; col <= ws.Dimension.End.Column; col++)
{
if ("Existing Constituent Master Id".Equals(ws.Cells[headerRow, col].Value))
{
col_cnst_mstr_id = col;
}
if ("Prefix of the constituent(Mr, Mrs etc)".Equals(ws.Cells[headerRow, col].Value))
{
col_cnst_prefix_nm = col;
}
}
//Read the file into memory
// loop over all rows
for (int rw = 4; rw <= ws.Dimension.End.Row; rw++)
{
// check if both values are not null
if (ws.Cells[rw, col_cnst_mstr_id].Value != null &&
ws.Cells[rw, col_cnst_prefix_nm].Value != null)
{
// the correct cell will be selcted based on the column index
var gm = new GroupMembershipUploadInput
{
cnst_mstr_id = (string) ws.Cells[rw, col_cnst_mstr_id].Value ?? String.Empty,
cnst_prefix_nm = (string) ws.Cells[rw, col_cnst_prefix_nm].Value ?? String.Empty
};
lgl.GroupMembershipUploadInputList.Add(gm);
}
}
}
I removed the inner column loop and moved it to the start of the method. There it is used to just get the columnindex for each field you're interested in. The expensive null check can now also be reduced. To fetch the value, all that is now needed is a simple index lookup in the row.

C# OPEN XML: empty cells are getting skipped while getting data from EXCEL to DATATABLE

Task
Import data from excel to DataTable
Problem
The cell that doesnot contain any data are getting skipped and the very next cell that has data in the row is used as the value of the empty colum.
E.g
A1 is empty A2 has a value Tom then while importing the data A1 get the value of A2 and A2 remains empty
To make it very clear I am providing some screen shots below
This is the excel data
This is the DataTable after importing the data from excel
Code
public class ImportExcelOpenXml
{
public static DataTable Fill_dataTable(string fileName)
{
DataTable dt = new DataTable();
using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(fileName, false))
{
WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
IEnumerable<Sheet> sheets = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
string relationshipId = sheets.First().Id.Value;
WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId);
Worksheet workSheet = worksheetPart.Worksheet;
SheetData sheetData = workSheet.GetFirstChild<SheetData>();
IEnumerable<Row> rows = sheetData.Descendants<Row>();
foreach (Cell cell in rows.ElementAt(0))
{
dt.Columns.Add(GetCellValue(spreadSheetDocument, cell));
}
foreach (Row row in rows) //this will also include your header row...
{
DataRow tempRow = dt.NewRow();
for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
}
dt.Rows.Add(tempRow);
}
}
dt.Rows.RemoveAt(0); //...so i'm taking it out here.
return dt;
}
public static string GetCellValue(SpreadsheetDocument document, Cell cell)
{
SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
string value = cell.CellValue.InnerXml;
if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
{
return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
}
else
{
return value;
}
}
}
My Thoughts
I think there is some problem with
public IEnumerable<T> Descendants<T>() where T : OpenXmlElement;
In case I want the count of columns using Descendants
IEnumerable<Row> rows = sheetData.Descendants<<Row>();
int colCnt = rows.ElementAt(0).Count();
OR
If I am getting the count of rows using Descendants
IEnumerable<Row> rows = sheetData.Descendants<<Row>();
int rowCnt = rows.Count();`
In both cases Descendants is skipping the empty cells
Is there any alternative of Descendants.
Your suggestions are highly appreciated
P.S: I have also thought of getting the cells values by using column names like A1, A2 but in order to do that I will have to get the exact count of columns and rows which is not possible by using Descendants function.

Had there been some data in all the cells of a row then everything works fine. But if you happen to have even single empty cell in a row then things go haywire.
Why it is happening in first place?
The reason lies in below line of code:
row.Descendants<Cell>().Count()
Count() function gives you the number of non-empty cells in the row i.e. it will ignore all the empty cells while returning the count. So, when you pass row.Descendants<Cell>().ElementAt(i) as argument to GetCellValue method like this:
GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
Then, it will find the content of the next non-empty cell, not necessarily the content of the cell at column index i e.g. if the first column is empty and we call ElementAt(1), it returns the value in the second column instead and our program logic gets messed up.
Solution: We need to deal with the occurrence of empty cells in the row i.e. we need to figure out the actual/effective column index of the target cell in case there were some empty cells before it in the given row. So, you need to substitute your for loop code below:
for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
}
with
for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
Cell cell = row.Descendants<Cell>().ElementAt(i);
int actualCellIndex = CellReferenceToIndex(cell);
tempRow[actualCellIndex] = GetCellValue(spreadSheetDocument, cell);
}
Also, add below method in your code which is used in the above modified code snippet to obtain the actual/effective column index of any cell:
private static int CellReferenceToIndex(Cell cell)
{
int index = 0;
string reference = cell.CellReference.ToString().ToUpper();
foreach (char ch in reference)
{
if (Char.IsLetter(ch))
{
int value = (int)ch - (int)'A';
index = (index == 0) ? value : ((index + 1) * 26) + value;
}
else
{
return index;
}
}
return index;
}
Note: Index in an Excel row start with 1 unlike various programming languages where it starts at 0.

public void Read2007Xlsx()
{
try
{
DataTable dt = new DataTable();
using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(#"D:\File.xlsx", false))
{
WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
IEnumerable<Sheet> sheets = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
string relationshipId = sheets.First().Id.Value;
WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId);
Worksheet workSheet = worksheetPart.Worksheet;
SheetData sheetData = workSheet.GetFirstChild<SheetData>();
IEnumerable<Row> rows = sheetData.Descendants<Row>();
foreach (Cell cell in rows.ElementAt(0))
{
dt.Columns.Add(GetCellValue(spreadSheetDocument, cell));
}
foreach (Row row in rows) //this will also include your header row...
{
DataRow tempRow = dt.NewRow();
int columnIndex = 0;
foreach (Cell cell in row.Descendants<Cell>())
{
// Gets the column index of the cell with data
int cellColumnIndex = (int)GetColumnIndexFromName(GetColumnName(cell.CellReference));
cellColumnIndex--; //zero based index
if (columnIndex < cellColumnIndex)
{
do
{
tempRow[columnIndex] = ""; //Insert blank data here;
columnIndex++;
}
while (columnIndex < cellColumnIndex);
}//end if block
tempRow[columnIndex] = GetCellValue(spreadSheetDocument, cell);
columnIndex++;
}//end inner foreach loop
dt.Rows.Add(tempRow);
}//end outer foreach loop
}//end using block
dt.Rows.RemoveAt(0); //...so i'm taking it out here.
}//end try
catch (Exception ex)
{
}
}//end Read2007Xlsx method
/// <summary>
/// Given a cell name, parses the specified cell to get the column name.
/// </summary>
/// <param name="cellReference">Address of the cell (ie. B2)</param>
/// <returns>Column Name (ie. B)</returns>
public static string GetColumnName(string cellReference)
{
// Create a regular expression to match the column name portion of the cell name.
Regex regex = new Regex("[A-Za-z]+");
Match match = regex.Match(cellReference);
return match.Value;
} //end GetColumnName method
/// <summary>
/// Given just the column name (no row index), it will return the zero based column index.
/// Note: This method will only handle columns with a length of up to two (ie. A to Z and AA to ZZ).
/// A length of three can be implemented when needed.
/// </summary>
/// <param name="columnName">Column Name (ie. A or AB)</param>
/// <returns>Zero based index if the conversion was successful; otherwise null</returns>
public static int? GetColumnIndexFromName(string columnName)
{
//return columnIndex;
string name = columnName;
int number = 0;
int pow = 1;
for (int i = name.Length - 1; i >= 0; i--)
{
number += (name[i] - 'A' + 1) * pow;
pow *= 26;
}
return number;
} //end GetColumnIndexFromName method
public static string GetCellValue(SpreadsheetDocument document, Cell cell)
{
SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
if (cell.CellValue ==null)
{
return "";
}
string value = cell.CellValue.InnerXml;
if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
{
return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
}
else
{
return value;
}
}//end GetCellValue method

foreach (Cell cell in row.Descendants<Cell>())
{
while (columnRef[i] + (dt.Rows.Count + 1) != cell.CellReference)
{
dt.Rows[dt.Rows.Count - 1][i] = "";
i += 1;
}
dt.Rows[dt.Rows.Count - 1][i] = GetValue(doc, cell);
i++;
}

Try this code. I have done little modifications and it worked for me:
public static DataTable Fill_dataTable(string filePath)
{
DataTable dt = new DataTable();
using (SpreadsheetDocument doc = SpreadsheetDocument.Open(filePath, false))
{
Sheet sheet = doc.WorkbookPart.Workbook.Sheets.GetFirstChild<Sheet>();
Worksheet worksheet = doc.WorkbookPart.GetPartById(sheet.Id.Value) as WorksheetPart.Worksheet;
IEnumerable<Row> rows = worksheet.GetFirstChild<SheetData>().Descendants<Row>();
DataTable dt = new DataTable();
List<string> columnRef = new List<string>();
foreach (Row row in rows)
{
if (row.RowIndex != null)
{
if (row.RowIndex.Value == 1)
{
foreach (Cell cell in row.Descendants<Cell>())
{
dt.Columns.Add(GetValue(doc, cell));
columnRef.Add(cell.CellReference.ToString().Substring(0, cell.CellReference.ToString().Length - 1));
}
}
else
{
dt.Rows.Add();
int i = 0;
foreach (Cell cell in row.Descendants<Cell>())
{
while (columnRef(i) + dt.Rows.Count + 1 != cell.CellReference)
{
dt.Rows(dt.Rows.Count - 1)(i) = "";
i += 1;
}
dt.Rows(dt.Rows.Count - 1)(i) = GetValue(doc, cell);
i += 1;
}
}
}
}
}
return dt;
}

Having trouble reading excel file with the OpenXML sdk

I have a function that reads from an excel file and stores the results in a DataSet. I have another function that writes to an excel file. When I try to read from a regular human-generated excel file, the excel reading function returns a blank DataSet, but when I read from the excel file generated by the writing function, it works perfectly fine. The function then will not work on a regular generated excel file, even when I just copy and paste the contents of the function generated excel file. I finally tracked it down to this, but I have no idea where to go from here. Is there something wrong with my code?
Here is the excel generating function:
public static Boolean writeToExcel(string fileName, DataSet data)
{
Boolean answer = false;
using (SpreadsheetDocument excelDoc = SpreadsheetDocument.Create(tempPath + fileName, SpreadsheetDocumentType.Workbook))
{
WorkbookPart workbookPart = excelDoc.AddWorkbookPart();
workbookPart.Workbook = new Workbook();
WorksheetPart worksheetPart = workbookPart.AddNewPart<WorksheetPart>();
Sheets sheets = excelDoc.WorkbookPart.Workbook.AppendChild<Sheets>(new Sheets());
Sheet sheet = new Sheet()
{
Id = excelDoc.WorkbookPart.GetIdOfPart(worksheetPart),
SheetId = 1,
Name = "Page1"
};
sheets.Append(sheet);
CreateWorkSheet(worksheetPart, data);
answer = true;
}
return answer;
}
private static void CreateWorkSheet(WorksheetPart worksheetPart, DataSet data)
{
Worksheet worksheet = new Worksheet();
SheetData sheetData = new SheetData();
UInt32Value currRowIndex = 1U;
int colIndex = 0;
Row excelRow;
DataTable table = data.Tables[0];
for (int rowIndex = -1; rowIndex < table.Rows.Count; rowIndex++)
{
excelRow = new Row();
excelRow.RowIndex = currRowIndex++;
for (colIndex = 0; colIndex < table.Columns.Count; colIndex++)
{
Cell cell = new Cell()
{
CellReference = Convert.ToString(Convert.ToChar(65 + colIndex)),
DataType = CellValues.String
};
CellValue cellValue = new CellValue();
if (rowIndex == -1)
{
cellValue.Text = table.Columns[colIndex].ColumnName.ToString();
}
else
{
cellValue.Text = (table.Rows[rowIndex].ItemArray[colIndex].ToString() != "") ? table.Rows[rowIndex].ItemArray[colIndex].ToString() : "*";
}
cell.Append(cellValue);
excelRow.Append(cell);
}
sheetData.Append(excelRow);
}
SheetFormatProperties formattingProps = new SheetFormatProperties()
{
DefaultColumnWidth = 20D,
DefaultRowHeight = 20D
};
worksheet.Append(formattingProps);
worksheet.Append(sheetData);
worksheetPart.Worksheet = worksheet;
}
while the reading function is as following:
public static void readInventoryExcel(string fileName, ref DataSet set)
{
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(fileName, false))
{
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
int count = -1;
foreach (Row r in sheetData.Elements<Row>())
{
if (count >= 0)
{
DataRow row = set.Tables[0].NewRow();
row["SerialNumber"] = r.ChildElements[1].InnerXml;
row["PartNumber"] = r.ChildElements[2].InnerXml;
row["EntryDate"] = r.ChildElements[3].InnerXml;
row["RetirementDate"] = r.ChildElements[4].InnerXml;
row["ReasonForReplacement"] = r.ChildElements[5].InnerXml;
row["RetirementTech"] = r.ChildElements[6].InnerXml;
row["IncludeInMaintenance"] = r.ChildElements[7].InnerXml;
row["MaintenanceTech"] = r.ChildElements[8].InnerXml;
row["Comment"] = r.ChildElements[9].InnerXml;
row["Station"] = r.ChildElements[10].InnerXml;
row["LocationStatus"] = r.ChildElements[11].InnerXml;
row["AssetName"] = r.ChildElements[12].InnerXml;
row["InventoryType"] = r.ChildElements[13].InnerXml;
row["Description"] = r.ChildElements[14].InnerXml;
set.Tables[0].Rows.Add(row);
}
count++;
}
}
}

I think this is caused by the fact that you have only one sheet whereas Excel has three. I'm not certain but I think the sheets are returned in reverse order so you should change the line:
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
to
WorksheetPart worksheetPart = workbookPart.WorksheetParts.Last();
It might be safer to search for the WorksheetPart if you can identify it by the sheet name. You need to find the Sheet first then use the Id of that to find the SheetPart:
private WorksheetPart GetWorksheetPartBySheetName(WorkbookPart workbookPart, string sheetName)
{
//find the sheet first.
IEnumerable<Sheet> sheets = workbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>().Where(s => s.Name == sheetName);
if (sheets.Count() > 0)
{
string relationshipId = sheets.First().Id.Value;
WorksheetPart worksheetPart = (WorksheetPart)workbookPart.GetPartById(relationshipId);
return worksheetPart;
}
return null;
}
You can then use:
WorksheetPart worksheetPart = GetWorksheetPartBySheetName(workbookPart, "Sheet1");
There are a couple of other things I've noticed whilst looking at your code which you may (or may not!) be interested in:
In your code you are only reading the InnerXml so it might not matter to you but the way Excel stores strings is different to the way you are writing them so reading an Excel generated file may not give you the values you expect. In your example you are writing the string directly to the cell like this:
But Excel uses a SharedStrings concept where all strings are written to a separate XML file called sharedStrings.xml. That file contains the strings used in the Excel file with a reference and it's that value that is stored in the cell value in the sheet XML.
The sharedString.xml looks like this:
And the Cell then looks like this:
The 47 in the <v> element is a reference to the 47th shared string. Note that the type (the t attribute) in your generated XML is str but the type in the Excel generated file is s. This denotes yours is an inline string and theirs is a shared string.
You can read the SharedStrings just as you would any other part:
var stringTable = workbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
if (stringTable != null)
{
sharedString = stringTable.SharedStringTable.ElementAt(int.Parse(value)).InnerText;
}
Secondly, if you look at the cell reference that your code generates and the cell reference that Excel generates you can see you are only outputting the column and not the row (e.g. you output A instead of A1). To fix this you should change the line:
CellReference = Convert.ToString(Convert.ToChar(65 + colIndex)),
to
CellReference = Convert.ToString(Convert.ToChar(65 + colIndex) + rowIndex.ToString()),
I hope that helps.

I ran into a similar issue a while back trying to do this for Word documents (procedurally generated worked fine, but human-generated did not). I found this tool to be very helpful:
http://www.microsoft.com/en-us/download/details.aspx?id=30425
Basically, it looks at a file and shows you the code that Microsoft would generate to read it, as well as the xml structure of the file itself. As usual for Microsoft products, there are quite a few menus and it's not very intuitive, but after clicking around for a bit you will be able to see exactly what is going on with any two files. I would recommend you open a working excel file and a non-working one and compare the difference to see what's causing your issue.

Below is the OpenXML code that I use to read in a particular Worksheet from an Excel file, into a DataTable.
First, here's how you'd call it:
DataTable dt = OpenXMLHelper.ExcelWorksheetToDataTable("C:\\SQL Server\\SomeExcelFile.xlsx", "Mikes Worksheet");
And here's the code:
public class OpenXMLHelper
{
public static DataTable ExcelWorksheetToDataTable(string pathFilename, string worksheetName)
{
DataTable dt = new DataTable(worksheetName);
using (SpreadsheetDocument document = SpreadsheetDocument.Open(pathFilename, false))
{
// Find the sheet with the supplied name, and then use that
// Sheet object to retrieve a reference to the first worksheet.
Sheet theSheet = document.WorkbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == worksheetName).FirstOrDefault();
if (theSheet == null)
throw new Exception("Couldn't find the worksheet: " + worksheetName);
// Retrieve a reference to the worksheet part.
WorksheetPart wsPart = (WorksheetPart)(document.WorkbookPart.GetPartById(theSheet.Id));
Worksheet workSheet = wsPart.Worksheet;
string dimensions = workSheet.SheetDimension.Reference.InnerText; // Get the dimensions of this worksheet, eg "B2:F4"
int numOfColumns = 0;
int numOfRows = 0;
CalculateDataTableSize(dimensions, ref numOfColumns, ref numOfRows);
System.Diagnostics.Trace.WriteLine(string.Format("The worksheet \"{0}\" has dimensions \"{1}\", so we need a DataTable of size {2}x{3}.", worksheetName, dimensions, numOfColumns, numOfRows));
SheetData sheetData = workSheet.GetFirstChild<SheetData>();
IEnumerable<Row> rows = sheetData.Descendants<Row>();
string[,] cellValues = new string[numOfColumns, numOfRows];
int colInx = 0;
int rowInx = 0;
string value = "";
SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
// Iterate through each row of OpenXML data
foreach (Row row in rows)
{
for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
// *DON'T* assume there's going to be one XML element for each item in each row...
Cell cell = row.Descendants<Cell>().ElementAt(i);
if (cell.CellValue == null || cell.CellReference == null)
continue; // eg when an Excel cell contains a blank string
// Convert this Excel cell's CellAddress into a 0-based offset into our array (eg "G13" -> [6, 12])
colInx = GetColumnIndexByName(cell.CellReference); // eg "C" -> 2 (0-based)
rowInx = GetRowIndexFromCellAddress(cell.CellReference)-1; // Needs to be 0-based
// Fetch the value in this cell
value = cell.CellValue.InnerXml;
if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
{
value = stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
}
cellValues[colInx, rowInx] = value;
}
dt.Rows.Add(dataRow);
}
// Copy the array of strings into a DataTable
for (int col = 0; col < numOfColumns; col++)
dt.Columns.Add("Column_" + col.ToString());
for (int row = 0; row < numOfRows; row++)
{
DataRow dataRow = dt.NewRow();
for (int col = 0; col < numOfColumns; col++)
{
dataRow.SetField(col, cellValues[col, row]);
}
dt.Rows.Add(dataRow);
}
#if DEBUG
// Write out the contents of our DataTable to the Output window (for debugging)
string str = "";
for (rowInx = 0; rowInx < maxNumOfRows; rowInx++)
{
for (colInx = 0; colInx < maxNumOfColumns; colInx++)
{
object val = dt.Rows[rowInx].ItemArray[colInx];
str += (val == null) ? "" : val.ToString();
str += "\t";
}
str += "\n";
}
System.Diagnostics.Trace.WriteLine(str);
#endif
return dt;
}
}
private static void CalculateDataTableSize(string dimensions, ref int numOfColumns, ref int numOfRows)
{
// How many columns & rows of data does this Worksheet contain ?
// We'll read in the Dimensions string from the Excel file, and calculate the size based on that.
// eg "B1:F4" -> we'll need 6 columns and 4 rows.
//
// (We deliberately ignore the top-left cell address, and just use the bottom-right cell address.)
try
{
string[] parts = dimensions.Split(':'); // eg "B1:F4"
if (parts.Length != 2)
throw new Exception("Couldn't find exactly *two* CellAddresses in the dimension");
numOfColumns = 1 + GetColumnIndexByName(parts[1]); // A=1, B=2, C=3 (1-based value), so F4 would return 6 columns
numOfRows = GetRowIndexFromCellAddress(parts[1]);
}
catch
{
throw new Exception("Could not calculate maximum DataTable size from the worksheet dimension: " + dimensions);
}
}
public static int GetRowIndexFromCellAddress(string cellAddress)
{
// Convert an Excel CellReference column into a 1-based row index
// eg "D42" -> 42
// "F123" -> 123
string rowNumber = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^0-9 _]", "");
return int.Parse(rowNumber);
}
public static int GetColumnIndexByName(string cellAddress)
{
// Convert an Excel CellReference column into a 0-based column index
// eg "D42" -> 3
// "F123" -> 5
var columnName = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^A-Z_]", "");
int number = 0, pow = 1;
for (int i = columnName.Length - 1; i >= 0; i--)
{
number += (columnName[i] - 'A' + 1) * pow;
pow *= 26;
}
return number - 1;
}
}
Just to mention, some of our company's Excel Worksheets have one or more blank rows at the top. Strangely, this prevented some other OpenXML libraries from reading in such Worksheets properly.
This code deliberately creates a DataTable with one value for each of the cells in the Worksheet, even the blank ones at the top.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

OpenXML how to get cell in range - c#

Related

c#: Is there a way to retrieve the cell address in excel from where data begins?

Reading excel file in c# using Microsoft DocumentFormat.OpenXml SDK

How to read data fast from an excel and convert it to list from file stream

C# OPEN XML: empty cells are getting skipped while getting data from EXCEL to DATATABLE

Having trouble reading excel file with the OpenXML sdk

Categories

Resources