How to remove row from Excel using OpenXML - c#

I'm trying to delete row using the following code:
SpreadsheetDocument document = SpreadsheetDocument.Open( "test.xlsx", true );
IEnumerable<Sheet> sheets = document.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>().Where( s => s.Name == "sheet1" );
string relationshipId = sheets.First().Id.Value;
WorksheetPart worksheetPart = ( WorksheetPart )document.WorkbookPart.GetPartById( relationshipId );
IEnumerable<Row> rows = worksheetPart.Worksheet.GetFirstChild<SheetData>().Elements<Row>();
Row row = rows.FirstOrDefault<Row>();
row.Remove();
worksheetPart.Worksheet.Save();
document.Close();
But all I get is cleaning the first cell. I need to delete row with offset up other rows.
How can I do this?
Before
After
What I want to get:

I solved the problem.
Attention:
I do not have much experience, so the following code may not be effective
Row row = rows.FirstOrDefault<Row>();
row.Remove();
string cr;
foreach ( Row rowElement in rows )
{
rowElement.RowIndex.Value -= 1;
IEnumerable<Cell> cells = rowElement.Elements<Cell>().ToList();
if ( cells != null )
{
foreach ( Cell cell in cells )
{
cr = cell.CellReference.Value;
int indexRow = Convert.ToInt32( Regex.Replace( cr, #"[^\d]+", "" ) ) - 1;
cr = Regex.Replace( cr, #"[\d-]", "" );
cell.CellReference.Value = $"{cr}{indexRow}";
}
}
}

This is how I managed to removed empty rows
using (SpreadsheetDocument document = SpreadsheetDocument.Open(pathToFile, true))
{
WorkbookPart wbPart = document.WorkbookPart;
var worksheet = wbPart.WorksheetParts.First().Worksheet;
var rows = worksheet.GetFirstChild<SheetData>().Elements<Row>();
// Skip headers
foreach (var row in rows.Skip(1))
{
if (/* some condition on which rows to delete*/)
{
row.Remove();
}
}
// Fix all row indexes
string cr;
for (int i = 2; i < rows.Count(); i++)
{
var newCurrentRowIndex = rows.ElementAt(i - 1).RowIndex.Value + 1;
var currentRow = rows.ElementAt(i);
currentRow.RowIndex.Value = updatedRowIndex;
IEnumerable<Cell> cells = currentRow.Elements<Cell>().ToList();
if (cells != null)
{
foreach (Cell cell in cells)
{
cr = cell.CellReference.Value;
cr = Regex.Replace(cell.CellReference.Value, #"[\d-]", "");
cell.CellReference.Value = $"{cr}{updatedRowIndex}";
}
}
}
worksheet.Save();
}

Related

C# OPEN XML: empty cells are getting skipped while getting data from EXCEL to DATATABLE

Task
Import data from excel to DataTable
Problem
The cell that doesnot contain any data are getting skipped and the very next cell that has data in the row is used as the value of the empty colum.
E.g
A1 is empty A2 has a value Tom then while importing the data A1 get the value of A2 and A2 remains empty
To make it very clear I am providing some screen shots below
This is the excel data
This is the DataTable after importing the data from excel
Code
public class ImportExcelOpenXml
{
public static DataTable Fill_dataTable(string fileName)
{
DataTable dt = new DataTable();
using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(fileName, false))
{
WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
IEnumerable<Sheet> sheets = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
string relationshipId = sheets.First().Id.Value;
WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId);
Worksheet workSheet = worksheetPart.Worksheet;
SheetData sheetData = workSheet.GetFirstChild<SheetData>();
IEnumerable<Row> rows = sheetData.Descendants<Row>();
foreach (Cell cell in rows.ElementAt(0))
{
dt.Columns.Add(GetCellValue(spreadSheetDocument, cell));
}
foreach (Row row in rows) //this will also include your header row...
{
DataRow tempRow = dt.NewRow();
for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
}
dt.Rows.Add(tempRow);
}
}
dt.Rows.RemoveAt(0); //...so i'm taking it out here.
return dt;
}
public static string GetCellValue(SpreadsheetDocument document, Cell cell)
{
SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
string value = cell.CellValue.InnerXml;
if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
{
return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
}
else
{
return value;
}
}
}
My Thoughts
I think there is some problem with
public IEnumerable<T> Descendants<T>() where T : OpenXmlElement;
In case I want the count of columns using Descendants
IEnumerable<Row> rows = sheetData.Descendants<<Row>();
int colCnt = rows.ElementAt(0).Count();
OR
If I am getting the count of rows using Descendants
IEnumerable<Row> rows = sheetData.Descendants<<Row>();
int rowCnt = rows.Count();`
In both cases Descendants is skipping the empty cells
Is there any alternative of Descendants.
Your suggestions are highly appreciated
P.S: I have also thought of getting the cells values by using column names like A1, A2 but in order to do that I will have to get the exact count of columns and rows which is not possible by using Descendants function.
Had there been some data in all the cells of a row then everything works fine. But if you happen to have even single empty cell in a row then things go haywire.
Why it is happening in first place?
The reason lies in below line of code:
row.Descendants<Cell>().Count()
Count() function gives you the number of non-empty cells in the row i.e. it will ignore all the empty cells while returning the count. So, when you pass row.Descendants<Cell>().ElementAt(i) as argument to GetCellValue method like this:
GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
Then, it will find the content of the next non-empty cell, not necessarily the content of the cell at column index i e.g. if the first column is empty and we call ElementAt(1), it returns the value in the second column instead and our program logic gets messed up.
Solution: We need to deal with the occurrence of empty cells in the row i.e. we need to figure out the actual/effective column index of the target cell in case there were some empty cells before it in the given row. So, you need to substitute your for loop code below:
for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
}
with
for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
Cell cell = row.Descendants<Cell>().ElementAt(i);
int actualCellIndex = CellReferenceToIndex(cell);
tempRow[actualCellIndex] = GetCellValue(spreadSheetDocument, cell);
}
Also, add below method in your code which is used in the above modified code snippet to obtain the actual/effective column index of any cell:
private static int CellReferenceToIndex(Cell cell)
{
int index = 0;
string reference = cell.CellReference.ToString().ToUpper();
foreach (char ch in reference)
{
if (Char.IsLetter(ch))
{
int value = (int)ch - (int)'A';
index = (index == 0) ? value : ((index + 1) * 26) + value;
}
else
{
return index;
}
}
return index;
}
Note: Index in an Excel row start with 1 unlike various programming languages where it starts at 0.
public void Read2007Xlsx()
{
try
{
DataTable dt = new DataTable();
using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(#"D:\File.xlsx", false))
{
WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
IEnumerable<Sheet> sheets = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
string relationshipId = sheets.First().Id.Value;
WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId);
Worksheet workSheet = worksheetPart.Worksheet;
SheetData sheetData = workSheet.GetFirstChild<SheetData>();
IEnumerable<Row> rows = sheetData.Descendants<Row>();
foreach (Cell cell in rows.ElementAt(0))
{
dt.Columns.Add(GetCellValue(spreadSheetDocument, cell));
}
foreach (Row row in rows) //this will also include your header row...
{
DataRow tempRow = dt.NewRow();
int columnIndex = 0;
foreach (Cell cell in row.Descendants<Cell>())
{
// Gets the column index of the cell with data
int cellColumnIndex = (int)GetColumnIndexFromName(GetColumnName(cell.CellReference));
cellColumnIndex--; //zero based index
if (columnIndex < cellColumnIndex)
{
do
{
tempRow[columnIndex] = ""; //Insert blank data here;
columnIndex++;
}
while (columnIndex < cellColumnIndex);
}//end if block
tempRow[columnIndex] = GetCellValue(spreadSheetDocument, cell);
columnIndex++;
}//end inner foreach loop
dt.Rows.Add(tempRow);
}//end outer foreach loop
}//end using block
dt.Rows.RemoveAt(0); //...so i'm taking it out here.
}//end try
catch (Exception ex)
{
}
}//end Read2007Xlsx method
/// <summary>
/// Given a cell name, parses the specified cell to get the column name.
/// </summary>
/// <param name="cellReference">Address of the cell (ie. B2)</param>
/// <returns>Column Name (ie. B)</returns>
public static string GetColumnName(string cellReference)
{
// Create a regular expression to match the column name portion of the cell name.
Regex regex = new Regex("[A-Za-z]+");
Match match = regex.Match(cellReference);
return match.Value;
} //end GetColumnName method
/// <summary>
/// Given just the column name (no row index), it will return the zero based column index.
/// Note: This method will only handle columns with a length of up to two (ie. A to Z and AA to ZZ).
/// A length of three can be implemented when needed.
/// </summary>
/// <param name="columnName">Column Name (ie. A or AB)</param>
/// <returns>Zero based index if the conversion was successful; otherwise null</returns>
public static int? GetColumnIndexFromName(string columnName)
{
//return columnIndex;
string name = columnName;
int number = 0;
int pow = 1;
for (int i = name.Length - 1; i >= 0; i--)
{
number += (name[i] - 'A' + 1) * pow;
pow *= 26;
}
return number;
} //end GetColumnIndexFromName method
public static string GetCellValue(SpreadsheetDocument document, Cell cell)
{
SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
if (cell.CellValue ==null)
{
return "";
}
string value = cell.CellValue.InnerXml;
if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
{
return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
}
else
{
return value;
}
}//end GetCellValue method
foreach (Cell cell in row.Descendants<Cell>())
{
while (columnRef[i] + (dt.Rows.Count + 1) != cell.CellReference)
{
dt.Rows[dt.Rows.Count - 1][i] = "";
i += 1;
}
dt.Rows[dt.Rows.Count - 1][i] = GetValue(doc, cell);
i++;
}
Try this code. I have done little modifications and it worked for me:
public static DataTable Fill_dataTable(string filePath)
{
DataTable dt = new DataTable();
using (SpreadsheetDocument doc = SpreadsheetDocument.Open(filePath, false))
{
Sheet sheet = doc.WorkbookPart.Workbook.Sheets.GetFirstChild<Sheet>();
Worksheet worksheet = doc.WorkbookPart.GetPartById(sheet.Id.Value) as WorksheetPart.Worksheet;
IEnumerable<Row> rows = worksheet.GetFirstChild<SheetData>().Descendants<Row>();
DataTable dt = new DataTable();
List<string> columnRef = new List<string>();
foreach (Row row in rows)
{
if (row.RowIndex != null)
{
if (row.RowIndex.Value == 1)
{
foreach (Cell cell in row.Descendants<Cell>())
{
dt.Columns.Add(GetValue(doc, cell));
columnRef.Add(cell.CellReference.ToString().Substring(0, cell.CellReference.ToString().Length - 1));
}
}
else
{
dt.Rows.Add();
int i = 0;
foreach (Cell cell in row.Descendants<Cell>())
{
while (columnRef(i) + dt.Rows.Count + 1 != cell.CellReference)
{
dt.Rows(dt.Rows.Count - 1)(i) = "";
i += 1;
}
dt.Rows(dt.Rows.Count - 1)(i) = GetValue(doc, cell);
i += 1;
}
}
}
}
}
return dt;
}

Data truncated while importing from Excel using OLEDB, what are the alternatives?

I have a WPF application where I have functionality to read data from Excel.
I was doing this using OLEDB and it was working great, until I found out there was a 255 limit for columns and the data would be truncated unless data > 255 characters is not present in the first eight rows. Fix for this issue is to update the registry which would mean updating all users' registries. So I don't want to go with that approach.
OLEDB code:
string strSQL = "SELECT * FROM [Sheet1$]";
OleDbCommand cmd = new OleDbCommand(strSQL, conn);
DataSet ds1 = new DataSet();
OleDbDataAdapter da = new OleDbDataAdapter(cmd);
da.Fill(ds1);
As an alternative, I tried Interop.Excel . However, it seems to be slower that OLEDB. The Excel sheets that were taking 2 seconds to load take about 15 seconds using Interop.Excel.
System.Data.DataTable tempTable = new System.Data.DataTable();
tempTable.TableName = "ResultData";
Excel.Application app = new Excel.Application();
Excel.Workbook book = null;
Excel.Range range = null;
try
{
app.Visible = false;
app.ScreenUpdating = false;
app.DisplayAlerts = false;
book = app.Workbooks.Open(inputFilePath, Missing.Value, Missing.Value, Missing.Value
, Missing.Value, Missing.Value, Missing.Value, Missing.Value
, Missing.Value, Missing.Value, Missing.Value, Missing.Value
, Missing.Value, Missing.Value, Missing.Value);
foreach (Excel.Worksheet sheet in book.Worksheets)
{
Logger.LogException("Values for Sheet " + sheet.Index, System.Reflection.MethodBase.GetCurrentMethod().ToString());
// get a range to work with
range = sheet.get_Range("A1", Missing.Value);
// get the end of values to the right (will stop at the first empty cell)
range = range.get_End(Excel.XlDirection.xlToRight);
// get the end of values toward the bottom, looking in the last column (will stop at first empty cell)
range = range.get_End(Excel.XlDirection.xlDown);
// get the address of the bottom, right cell
string downAddress = range.get_Address(
false, false, Excel.XlReferenceStyle.xlA1,
Type.Missing, Type.Missing);
// Get the range, then values from a1
range = sheet.get_Range("A1", downAddress);
object[,] values = (object[,])range.Value2;
//Get the Column Names
for (int k = 0; k < values.GetLength(1); )
{
tempTable.Columns.Add(Convert.ToString(values[1, ++k]).Trim());
}
for (int i = 2; i <= values.GetLength(0); i++)//first row contains the column names, so start from the next row.
{
System.Data.DataRow dr = tempTable.NewRow();
for (int j = 1; j <= values.GetLength(1); j++)//columns
{
dr[j - 1] = values[i, j];
}
tempTable.Rows.Add(dr);
}
}
Is there another alternative which is as fast as OLEDB?
The columns and rows are not fixed in the Excel sheet.
If you are working with xlsx files I would recommend switching to the Open XML SDK for Office, it performs much better than OLEDB or the Interop methods of connecting.
However, some people consider the SDK difficult to use so there are 3rd party packages that will wrap up the SDK into a more friendly interface, but personally I don't find the SDK too hard to do.
Thank you for your response.Here is the final code that I used:
Reference link : http://www.prowareness.com/blog/reading-data-from-excel-document-using-openxml/
public static DataSet ExtractExcelSheetValuesToDataTable(string xlsxFilePath, string sheetName)
{
DataTable dt = new DataTable();
DataSet ds = new DataSet();
using (SpreadsheetDocument myWorkbook = SpreadsheetDocument.Open(xlsxFilePath, true))
{
//Access the main Workbook part, which contains data
WorkbookPart workbookPart = myWorkbook.WorkbookPart;
WorksheetPart worksheetPart = null;
if (!string.IsNullOrEmpty(sheetName))
{
Sheet ss = workbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == sheetName).SingleOrDefault<Sheet>();
worksheetPart = (WorksheetPart)workbookPart.GetPartById(ss.Id);
}
else
{
worksheetPart = workbookPart.WorksheetParts.FirstOrDefault();
}
SharedStringTablePart stringTablePart = workbookPart.SharedStringTablePart;
if (worksheetPart != null)
{
Row lastRow = worksheetPart.Worksheet.Descendants<Row>().LastOrDefault();
Row firstRow = worksheetPart.Worksheet.Descendants<Row>().FirstOrDefault();
if (firstRow != null)
{
foreach (Cell c in firstRow.ChildElements)
{
string value = GetValue(c, stringTablePart);
dt.Columns.Add(value);
}
}
if (lastRow != null)
{
for (int i = 2; i <= lastRow.RowIndex; i++)
{
DataRow dr = dt.NewRow();
bool empty = true;
Row row = worksheetPart.Worksheet.Descendants<Row>().Where(r => i == r.RowIndex).FirstOrDefault();
int j = 0;
if (row != null)
{
foreach (Cell c in row.Descendants<Cell>())
{
int? colIndex = GetColumnIndex(((DocumentFormat.OpenXml.Spreadsheet.CellType)(c)).CellReference);
if (colIndex > j)
{
dr[j] = "";
j++;
}
//Get cell value
string value = GetValue(c, stringTablePart);
//if (!string.IsNullOrEmpty(value))
// empty = false;
dr[j] = value;
j++;
if (j == dt.Columns.Count)
break;
}
//foreach (Cell c in row.ChildElements)
//{
// //Get cell value
// string value = GetValue(c, stringTablePart);
// //if (!string.IsNullOrEmpty(value))
// // empty = false;
// dr[j] = value;
// j++;
// if (j == dt.Columns.Count)
// break;
//}
//if (empty)
// break;
dt.Rows.Add(dr);
}
}
}
}
}
ds.Tables.Add(dt);
return ds;
}
public static string GetValue(Cell cell, SharedStringTablePart stringTablePart)
{
if (cell.ChildElements.Count == 0) return null;
//get cell value
string value = cell.ElementAt(0).InnerText;//CellValue.InnerText;
//Look up real value from shared string table
if ((cell.DataType != null) && (cell.DataType == CellValues.SharedString))
value = stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
return value;
}
private static int? GetColumnIndex(string cellReference)
{
if (string.IsNullOrEmpty(cellReference))
{
return null;
}
//remove digits
string columnReference = Regex.Replace(cellReference.ToUpper(), #"[\d]", string.Empty);
int columnNumber = -1;
int mulitplier = 1;
//working from the end of the letters take the ASCII code less 64 (so A = 1, B =2...etc)
//then multiply that number by our multiplier (which starts at 1)
//multiply our multiplier by 26 as there are 26 letters
foreach (char c in columnReference.ToCharArray().Reverse())
{
columnNumber += mulitplier * ((int)c - 64);
mulitplier = mulitplier * 26;
}
//the result is zero based so return columnnumber + 1 for a 1 based answer
//this will match Excel's COLUMN function
return columnNumber;
}

OpenXML how to get cell in range

Please help me to get cell in range (ex from A:1 to E:11 are all cells in rectangular).
For now, my ideal is
Worksheet worksheet = GetWorksheet(document, sheetName);
SheetData sheetData = worksheet.GetFirstChild<SheetData>();
IEnumerable<Cell> cells = sheetData.Descendants<Cell>().Where(c =>
c.CellReference >= A:1 &&
c.CellReference <= E:11 &&
);
int t = cells.Count();
But this code does not work.
Thanks
It won't be that easy to compare cell's CellReference with a string. And yes, what you are currently doing is wrong. You simply cannot compare strings for Higher or Lower in such a way.
You have two options.
Option 1 :
You can take cell reference and break it down. That means separate characters and numbers and then give them values individually and compare
A1 - > A and 1 -> Give A =1 so you have 1 and 1
E11 -> E and 11 -> Give E = 5 so you have 5 and 11
So you will need to breakdown the CellReference and check the validity for your requirement.
Option 2 :
If you notice above it's simply we take a 2D matrix index (ex : 1,1 and 5,11 which are COLUMN,ROW format). You can simply use this feature in comparison. But catch is you cannot use LINQ for this, you need to iterate through rows and columns. I tried to give following example code, try it
using (SpreadsheetDocument myDoc = SpreadsheetDocument.Open("PATH", true))
{
//Get workbookpart
WorkbookPart workbookPart = myDoc.WorkbookPart;
// Extract the workbook part
var stringtable = workbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
//then access to the worksheet part
IEnumerable<WorksheetPart> worksheetPart = workbookPart.WorksheetParts;
foreach (WorksheetPart WSP in worksheetPart)
{
//find sheet data
IEnumerable<SheetData> sheetData = WSP.Worksheet.Elements<SheetData>();
int RowCount = 0;
int CellCount = 0;
// This is A1
int RowMin = 1;
int ColMin = 1;
//This is E11
int RowMax = 11;
int ColMax = 5;
foreach (SheetData SD in sheetData)
{
foreach (Row row in SD.Elements<Row>())
{
RowCount++; // We are in a new row
// For each cell we need to identify type
foreach (Cell cell in row.Elements<Cell>())
{
// We are in a new Cell
CellCount++;
if ((RowCount >= RowMin && CellCount >= ColMin) && (RowCount <= RowMax && CellCount <= ColMax))
{
if (cell.DataType == null && cell.CellValue != null)
{
// Check for pure numbers
Console.WriteLine(cell.CellValue.Text);
}
else if (cell.DataType.Value == CellValues.Boolean)
{
// Booleans
Console.WriteLine(cell.CellValue.Text);
}
else if (cell.CellValue != null)
{
// A shared string
if (stringtable != null)
{
// Cell value holds the shared string location
Console.WriteLine(stringtable.SharedStringTable.ElementAt(int.Parse(cell.CellValue.Text)).InnerText);
}
}
else
{
Console.WriteLine("A broken book");
}
}
}
// Reset Cell count
CellCount = 0;
}
}
}
}
This actually work. I tested.

Having trouble reading excel file with the OpenXML sdk

I have a function that reads from an excel file and stores the results in a DataSet. I have another function that writes to an excel file. When I try to read from a regular human-generated excel file, the excel reading function returns a blank DataSet, but when I read from the excel file generated by the writing function, it works perfectly fine. The function then will not work on a regular generated excel file, even when I just copy and paste the contents of the function generated excel file. I finally tracked it down to this, but I have no idea where to go from here. Is there something wrong with my code?
Here is the excel generating function:
public static Boolean writeToExcel(string fileName, DataSet data)
{
Boolean answer = false;
using (SpreadsheetDocument excelDoc = SpreadsheetDocument.Create(tempPath + fileName, SpreadsheetDocumentType.Workbook))
{
WorkbookPart workbookPart = excelDoc.AddWorkbookPart();
workbookPart.Workbook = new Workbook();
WorksheetPart worksheetPart = workbookPart.AddNewPart<WorksheetPart>();
Sheets sheets = excelDoc.WorkbookPart.Workbook.AppendChild<Sheets>(new Sheets());
Sheet sheet = new Sheet()
{
Id = excelDoc.WorkbookPart.GetIdOfPart(worksheetPart),
SheetId = 1,
Name = "Page1"
};
sheets.Append(sheet);
CreateWorkSheet(worksheetPart, data);
answer = true;
}
return answer;
}
private static void CreateWorkSheet(WorksheetPart worksheetPart, DataSet data)
{
Worksheet worksheet = new Worksheet();
SheetData sheetData = new SheetData();
UInt32Value currRowIndex = 1U;
int colIndex = 0;
Row excelRow;
DataTable table = data.Tables[0];
for (int rowIndex = -1; rowIndex < table.Rows.Count; rowIndex++)
{
excelRow = new Row();
excelRow.RowIndex = currRowIndex++;
for (colIndex = 0; colIndex < table.Columns.Count; colIndex++)
{
Cell cell = new Cell()
{
CellReference = Convert.ToString(Convert.ToChar(65 + colIndex)),
DataType = CellValues.String
};
CellValue cellValue = new CellValue();
if (rowIndex == -1)
{
cellValue.Text = table.Columns[colIndex].ColumnName.ToString();
}
else
{
cellValue.Text = (table.Rows[rowIndex].ItemArray[colIndex].ToString() != "") ? table.Rows[rowIndex].ItemArray[colIndex].ToString() : "*";
}
cell.Append(cellValue);
excelRow.Append(cell);
}
sheetData.Append(excelRow);
}
SheetFormatProperties formattingProps = new SheetFormatProperties()
{
DefaultColumnWidth = 20D,
DefaultRowHeight = 20D
};
worksheet.Append(formattingProps);
worksheet.Append(sheetData);
worksheetPart.Worksheet = worksheet;
}
while the reading function is as following:
public static void readInventoryExcel(string fileName, ref DataSet set)
{
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(fileName, false))
{
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
int count = -1;
foreach (Row r in sheetData.Elements<Row>())
{
if (count >= 0)
{
DataRow row = set.Tables[0].NewRow();
row["SerialNumber"] = r.ChildElements[1].InnerXml;
row["PartNumber"] = r.ChildElements[2].InnerXml;
row["EntryDate"] = r.ChildElements[3].InnerXml;
row["RetirementDate"] = r.ChildElements[4].InnerXml;
row["ReasonForReplacement"] = r.ChildElements[5].InnerXml;
row["RetirementTech"] = r.ChildElements[6].InnerXml;
row["IncludeInMaintenance"] = r.ChildElements[7].InnerXml;
row["MaintenanceTech"] = r.ChildElements[8].InnerXml;
row["Comment"] = r.ChildElements[9].InnerXml;
row["Station"] = r.ChildElements[10].InnerXml;
row["LocationStatus"] = r.ChildElements[11].InnerXml;
row["AssetName"] = r.ChildElements[12].InnerXml;
row["InventoryType"] = r.ChildElements[13].InnerXml;
row["Description"] = r.ChildElements[14].InnerXml;
set.Tables[0].Rows.Add(row);
}
count++;
}
}
}
I think this is caused by the fact that you have only one sheet whereas Excel has three. I'm not certain but I think the sheets are returned in reverse order so you should change the line:
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
to
WorksheetPart worksheetPart = workbookPart.WorksheetParts.Last();
It might be safer to search for the WorksheetPart if you can identify it by the sheet name. You need to find the Sheet first then use the Id of that to find the SheetPart:
private WorksheetPart GetWorksheetPartBySheetName(WorkbookPart workbookPart, string sheetName)
{
//find the sheet first.
IEnumerable<Sheet> sheets = workbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>().Where(s => s.Name == sheetName);
if (sheets.Count() > 0)
{
string relationshipId = sheets.First().Id.Value;
WorksheetPart worksheetPart = (WorksheetPart)workbookPart.GetPartById(relationshipId);
return worksheetPart;
}
return null;
}
You can then use:
WorksheetPart worksheetPart = GetWorksheetPartBySheetName(workbookPart, "Sheet1");
There are a couple of other things I've noticed whilst looking at your code which you may (or may not!) be interested in:
In your code you are only reading the InnerXml so it might not matter to you but the way Excel stores strings is different to the way you are writing them so reading an Excel generated file may not give you the values you expect. In your example you are writing the string directly to the cell like this:
But Excel uses a SharedStrings concept where all strings are written to a separate XML file called sharedStrings.xml. That file contains the strings used in the Excel file with a reference and it's that value that is stored in the cell value in the sheet XML.
The sharedString.xml looks like this:
And the Cell then looks like this:
The 47 in the <v> element is a reference to the 47th shared string. Note that the type (the t attribute) in your generated XML is str but the type in the Excel generated file is s. This denotes yours is an inline string and theirs is a shared string.
You can read the SharedStrings just as you would any other part:
var stringTable = workbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
if (stringTable != null)
{
sharedString = stringTable.SharedStringTable.ElementAt(int.Parse(value)).InnerText;
}
Secondly, if you look at the cell reference that your code generates and the cell reference that Excel generates you can see you are only outputting the column and not the row (e.g. you output A instead of A1). To fix this you should change the line:
CellReference = Convert.ToString(Convert.ToChar(65 + colIndex)),
to
CellReference = Convert.ToString(Convert.ToChar(65 + colIndex) + rowIndex.ToString()),
I hope that helps.
I ran into a similar issue a while back trying to do this for Word documents (procedurally generated worked fine, but human-generated did not). I found this tool to be very helpful:
http://www.microsoft.com/en-us/download/details.aspx?id=30425
Basically, it looks at a file and shows you the code that Microsoft would generate to read it, as well as the xml structure of the file itself. As usual for Microsoft products, there are quite a few menus and it's not very intuitive, but after clicking around for a bit you will be able to see exactly what is going on with any two files. I would recommend you open a working excel file and a non-working one and compare the difference to see what's causing your issue.
Below is the OpenXML code that I use to read in a particular Worksheet from an Excel file, into a DataTable.
First, here's how you'd call it:
DataTable dt = OpenXMLHelper.ExcelWorksheetToDataTable("C:\\SQL Server\\SomeExcelFile.xlsx", "Mikes Worksheet");
And here's the code:
public class OpenXMLHelper
{
public static DataTable ExcelWorksheetToDataTable(string pathFilename, string worksheetName)
{
DataTable dt = new DataTable(worksheetName);
using (SpreadsheetDocument document = SpreadsheetDocument.Open(pathFilename, false))
{
// Find the sheet with the supplied name, and then use that
// Sheet object to retrieve a reference to the first worksheet.
Sheet theSheet = document.WorkbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == worksheetName).FirstOrDefault();
if (theSheet == null)
throw new Exception("Couldn't find the worksheet: " + worksheetName);
// Retrieve a reference to the worksheet part.
WorksheetPart wsPart = (WorksheetPart)(document.WorkbookPart.GetPartById(theSheet.Id));
Worksheet workSheet = wsPart.Worksheet;
string dimensions = workSheet.SheetDimension.Reference.InnerText; // Get the dimensions of this worksheet, eg "B2:F4"
int numOfColumns = 0;
int numOfRows = 0;
CalculateDataTableSize(dimensions, ref numOfColumns, ref numOfRows);
System.Diagnostics.Trace.WriteLine(string.Format("The worksheet \"{0}\" has dimensions \"{1}\", so we need a DataTable of size {2}x{3}.", worksheetName, dimensions, numOfColumns, numOfRows));
SheetData sheetData = workSheet.GetFirstChild<SheetData>();
IEnumerable<Row> rows = sheetData.Descendants<Row>();
string[,] cellValues = new string[numOfColumns, numOfRows];
int colInx = 0;
int rowInx = 0;
string value = "";
SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
// Iterate through each row of OpenXML data
foreach (Row row in rows)
{
for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
// *DON'T* assume there's going to be one XML element for each item in each row...
Cell cell = row.Descendants<Cell>().ElementAt(i);
if (cell.CellValue == null || cell.CellReference == null)
continue; // eg when an Excel cell contains a blank string
// Convert this Excel cell's CellAddress into a 0-based offset into our array (eg "G13" -> [6, 12])
colInx = GetColumnIndexByName(cell.CellReference); // eg "C" -> 2 (0-based)
rowInx = GetRowIndexFromCellAddress(cell.CellReference)-1; // Needs to be 0-based
// Fetch the value in this cell
value = cell.CellValue.InnerXml;
if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
{
value = stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
}
cellValues[colInx, rowInx] = value;
}
dt.Rows.Add(dataRow);
}
// Copy the array of strings into a DataTable
for (int col = 0; col < numOfColumns; col++)
dt.Columns.Add("Column_" + col.ToString());
for (int row = 0; row < numOfRows; row++)
{
DataRow dataRow = dt.NewRow();
for (int col = 0; col < numOfColumns; col++)
{
dataRow.SetField(col, cellValues[col, row]);
}
dt.Rows.Add(dataRow);
}
#if DEBUG
// Write out the contents of our DataTable to the Output window (for debugging)
string str = "";
for (rowInx = 0; rowInx < maxNumOfRows; rowInx++)
{
for (colInx = 0; colInx < maxNumOfColumns; colInx++)
{
object val = dt.Rows[rowInx].ItemArray[colInx];
str += (val == null) ? "" : val.ToString();
str += "\t";
}
str += "\n";
}
System.Diagnostics.Trace.WriteLine(str);
#endif
return dt;
}
}
private static void CalculateDataTableSize(string dimensions, ref int numOfColumns, ref int numOfRows)
{
// How many columns & rows of data does this Worksheet contain ?
// We'll read in the Dimensions string from the Excel file, and calculate the size based on that.
// eg "B1:F4" -> we'll need 6 columns and 4 rows.
//
// (We deliberately ignore the top-left cell address, and just use the bottom-right cell address.)
try
{
string[] parts = dimensions.Split(':'); // eg "B1:F4"
if (parts.Length != 2)
throw new Exception("Couldn't find exactly *two* CellAddresses in the dimension");
numOfColumns = 1 + GetColumnIndexByName(parts[1]); // A=1, B=2, C=3 (1-based value), so F4 would return 6 columns
numOfRows = GetRowIndexFromCellAddress(parts[1]);
}
catch
{
throw new Exception("Could not calculate maximum DataTable size from the worksheet dimension: " + dimensions);
}
}
public static int GetRowIndexFromCellAddress(string cellAddress)
{
// Convert an Excel CellReference column into a 1-based row index
// eg "D42" -> 42
// "F123" -> 123
string rowNumber = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^0-9 _]", "");
return int.Parse(rowNumber);
}
public static int GetColumnIndexByName(string cellAddress)
{
// Convert an Excel CellReference column into a 0-based column index
// eg "D42" -> 3
// "F123" -> 5
var columnName = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^A-Z_]", "");
int number = 0, pow = 1;
for (int i = columnName.Length - 1; i >= 0; i--)
{
number += (columnName[i] - 'A' + 1) * pow;
pow *= 26;
}
return number - 1;
}
}
Just to mention, some of our company's Excel Worksheets have one or more blank rows at the top. Strangely, this prevented some other OpenXML libraries from reading in such Worksheets properly.
This code deliberately creates a DataTable with one value for each of the cells in the Worksheet, even the blank ones at the top.

Excel file with openxml with multiple sheets in single workbook

I am trying to insert data to multiple sheets. For my excel, I have two sheets which are "charts" and "ChartData". I'm able to update the data in sheet2, chartdata sheet, but I'm unable to insert data to sheet1. Here is code which I have tried to insert data into excel sheets. Here data is coming from database.
double ticks = DateTime.Now.Ticks;
// MarketAnalysis ms = new MarketAnalysis();
//ms.Marketanalysis();
File.Copy(Srcpath, #"E:\Works\OpenXML\DownloadTemplates\ExcelGenerated" + ticks + ".xlsx", true);
using (SpreadsheetDocument myworkbok = SpreadsheetDocument.Open(#"E:\Works\OpenXML\DownloadTemplates\ExcelGenerated" + ticks + ".xlsx", true))
{
//Acess the main workbook which contain all the references
WorkbookPart workbookpart = myworkbok.WorkbookPart;
//Get sheet by name
Sheet sheet = workbookpart.Workbook.Descendants<Sheet>().Where(s => s.Name == "ChartData").FirstOrDefault();
//Worksheet Part by ID
WorksheetPart worksheetpart = workbookpart.GetPartById(sheet.Id) as WorksheetPart;
//Sheet data contains all the data
SheetData sheetdata = worksheetpart.Worksheet.GetFirstChild<SheetData>();
DataSet ds = db.Chart1Data();
for (int i = 0; i < ds.Tables[0].Rows.Count; i++)
{
// if (ds.Tables[0].Rows !=DBNull)
//{
string Rowlabel = ds.Tables[0].Rows[i][0].ToString();
// float? FY13Actuval = Convert.ToInt32(ds.Tables[0].Rows[i][1]);
if (!string.IsNullOrEmpty(ds.Tables[0].Rows[i][1].ToString()))
{
// string s = ds.Tables[0].Rows[i][1].ToString();
//FY13Actuval=float.Parse(ds.Tables[0].Rows[i][1].ToString());
FY13Actuval = Convert.ToDouble(ds.Tables[0].Rows[i][1].ToString());
}
else
{
FY13Actuval = null;
}
double? Budget = Convert.ToDouble(ds.Tables[0].Rows[i][2].ToString());
double? Actuval = Convert.ToDouble(ds.Tables[0].Rows[i][3].ToString());
Row contentrow = CreateContentRow(index, Product, Actual, Budget, Forecast);
index++;
sheetdata.AppendChild(contentrow);
// }
}
.......Same code for the other 3 charts
Sheet sheet1 = workbookpart.Workbook.Descendants<Sheet>().Where(s => s.Name == "Charts").FirstOrDefault();
WorksheetPart worksheetpart1 = workbookpart.GetPartById(sheet1.Id) as WorksheetPart;
//Sheet data contains all the data
SheetData sheetdata1 = worksheetpart1.Worksheet.GetFirstChild<SheetData>();
DataSet dsTbl = db.Table();
int SCMTblIndex=5;
for (int i = 0; i < dsTbl.Tables[0].Rows.Count; i++)
{
Row tblRow = CreateScorecardMetricTblRow(SCMTblIndex, dsTbl.Tables[0].Rows[i][0].ToString(),
dsTbl.Tables[0].Rows[i][1].ToString(),
dsTbl.Tables[0].Rows[i][2].ToString());
SCMTblIndex++;
sheetdata1.AppendChild(tblRow);
}
myworkbok.WorkbookPart.Workbook.Save();
}
and the Methods to create the cells:
public static string[] headerColumns = new string[]{ "A", "B", "C", "D", "E", "F", "G", "I", "J" };
public static string[] header = new string[] { "D", "E", "F" };
private static Row CreateContentRow(int index, string Product, double? Actual, double? Budget, double? Forecast)
{
//Create New ROw
Row r = new Row();
r.RowIndex = (UInt32)index;
//Begin colums
Cell c0 = new Cell();
c0.CellReference = headerColumns[0] + index;
CellValue v0 = new CellValue();
v0.Text = Product;
c0.AppendChild(v0);
r.AppendChild(c0);
Cell c1 = new Cell();
c1.CellReference = headerColumns[1] + index;
CellValue v1 = new CellValue();
v1.Text = Actual.ToString();
c1.AppendChild(v1);
r.AppendChild(c1);
Cell c2 = new Cell();
c2.CellReference = headerColumns[2] + index;
CellValue v2 = new CellValue();
v2.Text = Budget.ToString();
c2.AppendChild(v2);
r.AppendChild(c2);
Cell c3 = new Cell();
c3.CellReference = headerColumns[3] + index;
CellValue v3 = new CellValue();
v3.Text = Forecast.ToString();
c3.AppendChild(v3);
r.AppendChild(c3);
return r;
}
public static Row CreateScorecardMetricTblRow(int index, string Act_Data, string Bud_Data, string VarPr_Data)
{
Row r = new Row();
r.RowIndex = (UInt32)index;
//Begin Colums
Cell c0 = new Cell();
c0.CellReference = header[0] + index;
CellValue v0 = new CellValue();
v0.Text = Act_Data;
c0.AppendChild(v0);
r.AppendChild(c0);
Cell c1 = new Cell();
c1.CellReference = header[1] + index;
CellValue v1 = new CellValue();
v1.Text = Bud_Data;
c1.AppendChild(v1);
r.AppendChild(c1);
Cell c2 = new Cell();
c2.CellReference = header[2] + index;
CellValue v2 = new CellValue();
v2.Text = VarPr_Data;
c2.AppendChild(v2);
r.AppendChild(c2);
return r;
}
for multiple sheets to updated just
// Open the document for editing.
using (SpreadsheetDocument spreadSheet = SpreadsheetDocument.Open(docName, true))
{
// Insert code here.
}
create a two classes for two sheets which you need to update sheet cell values.For each class constructor pass the SpreadsheetDocument object and implement insert text into a cell in a spreadsheet document.Just avoid the code for insertion of new sheet and try the insertion of the cell code only.
You can try this method.....it may be helpful..
private static Cell InsertCellInWorksheet(string columnName, int rowIndex, WorksheetPart worksheetPart)
{
Worksheet worksheet = worksheetPart.Worksheet;
SheetData sheetData = worksheet.GetFirstChild<SheetData>();
string cellReference = columnName + rowIndex;
Alignment alignment1 = new Alignment() { WrapText = true };
// If the worksheet does not contain a row with the specified row index, insert one.
Row row;
row = new Row() { RowIndex = 3, StyleIndex = 1 };
if (sheetData.Elements<Row>().Where(r => r.RowIndex == rowIndex).Count() != 0)
{
row = sheetData.Elements<Row>().Where(r => r.RowIndex == rowIndex).First();
}
else
{
row = new Row() { RowIndex = Convert.ToUInt32(rowIndex) };
sheetData.Append(row);
}
// If there is not a cell with the specified column name, insert one.
if (row.Elements<Cell>().Where(c => c.CellReference.Value == columnName + rowIndex).Count() > 0)
{
Row row2;
row2 = new Row() { RowIndex = 3, StyleIndex = 1 };
Cell rCell = null;
foreach (Cell celld in row.Elements<Cell>())
{
if (string.Compare(celld.CellReference.Value, "A3", true) > 0)
{
rCell = celld;
break;
}
}
// Add the cell to the cell table at A1.
Cell newCell = new Cell() { CellReference = "C3" };
//Cell newCell1 = new Cell() { CellReference = "D3" };
row.InsertBefore(newCell, rCell);
//row.InsertBefore(newCell1, rCell);
// Set the cell value to be a numeric value of 100.
newCell.CellValue = new CellValue("#GUIDeXactLCMS#");
//newCell1.CellValue = new CellValue("EN");
newCell.DataType = new EnumValue<CellValues>(CellValues.Number);
return row.Elements<Cell>().Where(c => c.CellReference.Value == cellReference).First();
}
else
{
// Cells must be in sequential order according to CellReference. Determine where to insert the new cell.
Cell refCell = null;
foreach (Cell cell in row.Elements<Cell>())
{
if (string.Compare(cell.CellReference.Value, cellReference, true) > 0)
{
//string val = cell.CellReference.Value;
refCell = cell;
break;
}
}
Cell newCell = new Cell() { CellReference = cellReference };
row.InsertBefore(newCell, refCell);
return newCell;
}
worksheet.Save();
}

Categories

Resources