Append columns before Sheetdata - c#

I need to add columns with specific widths to Worksheet while creating new Excel file. According to everywhere I looked this has to be done before Sheetdata (one link here). However I tried numerious things but can't get It working. My code for creating Excel file is from official site (link here - It's meant for ASP.NET but works fine for me).
Here is my code and last attempt to get It working:
public void Export_Datagridview(DataGridView dgv, string filename)
{
using (var workbook = SpreadsheetDocument.Create(filename, SpreadsheetDocumentType.Workbook))
{
var workbookPart = workbook.AddWorkbookPart();
workbook.WorkbookPart.Workbook = new Workbook();
workbook.WorkbookPart.Workbook.Sheets = new Sheets();
var sheetPart = workbook.WorkbookPart.AddNewPart<WorksheetPart>();
var sheetData = new SheetData();
//this part is giving me "object reference" error
sheetPart.Worksheet.InsertBefore(AutoFit_Columns(dgv, sheetPart.Worksheet), sheetData);
...//and so on...
And code for adding columns :
private Columns AutoFit_Columns(DataGridView dgv, Worksheet worksheet)
{
Columns cols = new Columns();
for (int col = 0; col < dgv.ColumnCount; colc++)
{
double max_width = 14.5; //something like default width in Excel
for (int row = 0; row < dgv.RowCount; row++)
{
double cell_width = Text_width(dgv.Rows[row].Cells[col].Value.ToString(), new System.Drawing.Font("Arial", 12.0F));
if (cell_width > max_width)
{
max_width = cell_width;
}
if (row == dgv.RowCount - 1) //last iteration - here we allready have max width within column
{
Column c = new Column() { Min = Convert.ToUInt32(col), Max = Convert.ToUInt32(col), Width = max_width, CustomWidth = true };
cols.Append(c);
worksheet.Append(cols);
}
}
}
return cols;
}
As you see, this is my attempt to Autofit columns based on data that get's exported from Datagridview. But before I can test other code I need to add columns properly. Any help kindly appreciated !

Figured It out, wasn't so easy to solve and identify all problems. Firstly I needed to change upper code to this:
public void Export_Datagridview(DataGridView dgv, string filename)
{
using (var workbook = SpreadsheetDocument.Create(filename, SpreadsheetDocumentType.Workbook))
{
var workbookPart = workbook.AddWorkbookPart();
workbook.WorkbookPart.Workbook = new Workbook();
workbook.WorkbookPart.Workbook.Sheets = new Sheets();
var sheetPart = workbook.WorkbookPart.AddNewPart<WorksheetPart>();
var sheetData = new SheetData();
//this part is new - I had to change method for autofit too
sheetPart.Worksheet = new Worksheet();
sheetPart.Worksheet.Append(AutoFit_Columns(dgv));
sheetPart.Worksheet.Append(sheetData);
...//and so on...
and then change Autofit method. It was still causing errors because loop started from 0, and there is no Column with index of 0. Changed It to 1:
for (int col = 1; col < dgv.ColumnCount; colc++)
{
Excel now opens with column widths set, though code for autofit needed adjustments. If anyone interested, here is my complete solution with autofit included.

Related

Loop on excel rows in different Excel worksheets C#

I have an excel file myexcel.xlsx which has multiple worksheets. All worksheets has the same columns names in the first row. One column is called ID and another column is called Total. I am going through each row in every worksheet then I wonder how I can then check the columns ID if it exists in any other row in the same or other worksheet. If the ID is not found somewhere else then the total will be only equal to the Total column of this row, but if the same ID exists in another row in the same/other worksheet then I want to add the Total column of the other row as well then ignore all these rows of the same ID in the for loop so that they are not repeated.
Excel.Application myapp = new Excel.Application();
Excel.Workbook myworkbook = myapp(#"myexcel.xlsx");
for (int i = 1; i <= myworkbook.Worksheets.Count; i++)
{
Excel._Worksheet myworksheet = myworkbook.Worksheets[i];
Excel.Range myrange = myworksheet.UsedRange;
int myrowCount = myrange.Rows.Count;
}
I don't know the correct syntax for working with Excel sheets, so I'll give you a basic example for what I think you are asking. You'll have to adjust the code so it works, but if you want to aggregate (get the sum) of the "Total" in all the rows with the same "ID" you should be able to do something like this:
var totalsDictionary = new Dictionary<int, int>();
for (var ws = 0; ws < worksheets.Count; ws++)
{
var worksheet = worksheets[ws];
for (var row = 0; row < worksheet.Rows.Count; row++)
{
var id = worksheet.Rows[row]["ID"];
var total = worksheet.Rows[row]["Total"];
if (totalsDictionary.ContainsKey(id))
{
totalsDictionary[id] += total;
}
else
{
totalsDictionary.Add(id, total);
}
}
}
// totalsDictionary now contains the sum of Totals for each ID

Add text to Excel sheet before data conversion

I want to add some basic information regarding the DataGridView before my DGV gets converted to Excel. This is an example from the internet sample output, currently I can upload all the data to Excel, but I'm not sure how to hardcode the title (row1 from pic) and some basic information (row 2 from pic) before the data gets converted.
If anything is unclear let me know.
//Create headers
for (int i = 0; i < dv.Columns.Count; i++)
{
Excel.Range CellHeadersRange = ws.get_Range(GetExcelColumnName(i + 1) + rowstartindex.ToString(), GetExcelColumnName(i + 1) + rowstartindex.ToString());
//Setting the borders
CellHeadersRange.BorderAround2(LineStyle.Thin, Excel.XlBorderWeight.xlThin, Excel.XlColorIndex.xlColorIndexNone,
Color.FromArgb(255, 0, 0), Type.Missing);
//Aligning headers to the middle
CellHeadersRange.HorizontalAlignment = Excel.XlHAlign.xlHAlignCenter;
//Updating header value to Excel
CellHeadersRange.Value = dv.Columns[i].HeaderText;
CellHeadersRange.Font.Bold = true;
double widthDGV = (dv.Columns[i].Width) / 10;
Math.Ceiling(widthDGV);
CellHeadersRange.ColumnWidth = widthDGV * 2;
}
//Write data
for (int i = 0; i < dv.Rows.Count - 1; i++)
{
for (int j = 0; j < dv.Columns.Count; j++)
{
Excel.Range CellDataRange = ws.get_Range(GetExcelColumnName(j + 1) + (i + rowstartindex + 1).ToString());
CellDataRange.WrapText = true;
//Setting borders for all cells
CellDataRange.BorderAround2(LineStyle.Thin, Excel.XlBorderWeight.xlThin, Excel.XlColorIndex.xlColorIndexNone, Color.FromArgb(255, 0, 0), Type.Missing);
CellDataRange.Value = dv[j, i].Value;
//Verify that backgroundcolor of datagrid is not RGB(0,0,0,0) and in that case apply datagridviewcell color to excel range
if (!dv.Rows[i].Cells[j].Style.BackColor.IsEmpty)
CellDataRange.Interior.Color = dv.Rows[i].Cells[j].Style.BackColor;
//Verify that font style exist before checking for bold and in that case apply datagridviewcell font.bold property to excel range
if (dv.Rows[i].Cells[j].Style.Font != null)
CellDataRange.Font.Bold = dv.Rows[i].Cells[j].Style.Font.Bold;
}
}
wb = null;
ws = null;
xlApp = null;
GC.Collect();
GC.WaitForPendingFinalizers();
Edit: Maybe my question is all over places due to that I'm getting no response so I was wondering how can I add data to the first row of an existing excel sheet? Lets say using my code i first generate my excel sheet than I would like to insert couple new rows in the beginning of the sheet. How can I achieve that??
Solution:
//Writitng the title
Excel.Range line = (Excel.Range)ws.Rows[1];
line.Insert();
Excel.Range rng = ws.get_Range("B1");
rng.RowHeight = 35;
rng.Font.Bold = true;
rng.Value = "Assumptions";
rng.HorizontalAlignment = Excel.XlHAlign.xlHAlignCenter;
rng.Font.Size = 14;
rng.Interior.Color = Color.LightBlue;
rng = ws.get_Range("A1");
rng.Interior.Color = Color.LightBlue;
You can manipulate where you want to insert row by changing this ws.Rows[x];

OpenXML - change Excel cell format (Date and Number) when exporting from Datagridview

I using OpenXML to export Datagridview to Excel. If I export cells with CellValues.String evertyhing works fine without any errors in Excel file, but what I need is to properly convert all Date and Number data into corresponding cell format. I've tried to use built-in formats (not custom ones) to change format of cells, but then my Excel got corrupted.
Here is what I tried so far:
public void Export_to_Excel(DataGridView dgv, string path)
{
using (var workbook = SpreadsheetDocument.Create(path, SpreadsheetDocumentType.Workbook))
{
var workbookPart = workbook.AddWorkbookPart();
workbook.WorkbookPart.Workbook = new Workbook();
workbook.WorkbookPart.Workbook.Sheets = new Sheets();
var sheetPart = workbook.WorkbookPart.AddNewPart<WorksheetPart>();
var sheetData = new SheetData();
sheetPart.Worksheet = new Worksheet(sheetData);
Sheets sheets = workbook.WorkbookPart.Workbook.GetFirstChild<Sheets>();
string relationshipId = workbook.WorkbookPart.GetIdOfPart(sheetPart);
uint sheetId = 1;
if (sheets.Elements<Sheet>().Count() > 0)
{
sheetId =
sheets.Elements<Sheet>().Select(s => s.SheetId.Value).Max() + 1;
}
Sheet sheet = new Sheet() { Id = relationshipId, SheetId = sheetId, Name = "List "+ sheetId};
sheets.Append(sheet);
Row headerRow = new Row();
// Construct column names
List<String> columns = new List<string>();
foreach (DataGridViewColumn column in dgv.Columns)
{
columns.Add(column.Name);
Cell cell = new Cell
{
DataType = CellValues.String,
CellValue = new CellValue(column.HeaderText)
};
headerRow.AppendChild(cell);
}
// Add the row values to the excel sheet
sheetData.AppendChild(headerRow);
foreach (DataGridViewRow dsrow in dgv.Rows)
{
Row newRow = new Row();
foreach (String col in columns)
{
CellValues cell_type = new CellValues();
string cell_value = "";
UInt32 style_index;
if (dsrow.Cells[col].ValueType == typeof(decimal)) //numbers
{
cell_type = CellValues.Number;
cell_value = ((decimal)dsrow.Cells[col].Value).ToString();
style_index = 4; //should be #,##0.00
}
else if (dsrow.Cells[col].ValueType == typeof(DateTime)) //dates
{
cell_type = CellValues.String;
cell_value = ((DateTime)dsrow.Cells[col].Value).ToString("dd.mm.yyyy");
style_index =0; //should be General
}
else
{
cell_type = CellValues.String;
cell_value = dsrow.Cells[col].Value.ToString();
index_stila = 0; //should be General
}
Cell cell = new Cell();
cell.DataType = new EnumValue<CellValues>(cell_type);
cell.CellValue = new CellValue(cell_value);
cell.StyleIndex = style_index;
newRow.AppendChild(cell);
}
sheetData.AppendChild(newRow);
}
}
}
So basically, what I would like is to have this cells formatted correctly. In above code I tried only for Number format, but I need same for Date format too. Here is also a link to built-in styles for OpenXML.
I solved above problem. I must say that working with OpenXML is a bit frustrating but I'm happy with end results.
I decided – based on many OpenXML topics - to extend answer with providing a full useable code, not just examples as I ussually encountered on many sites.
My basic requirement was to export Datagridview data into Excel file, with correct cell formatting and faster export speed than current Interop solution we use. Code below can be used with Datatable or Dataset also, with just a slight modification. I've also added some other functionalities which in my opinion should be documented as that Is what most programmers need in Excel, but unfortunally they're not.
I won't go in depths for everything since I allready had some headaches doing all of that, so let's cut to the chase. Result of complete code below is Excel file with exported data from Datagridview and :
column names same as Datagridview headers & in bold font;
changed default font »Calibri« to »Arial«;
cell formating based on actual data from Datatable(dates,numbers & string) with desired format;
Save File dialog prompt;
autofit for columns;
As many others stated, order in OpenXML is very important. That applies for pretty much everything – when you create document or style It. So everything you see here works fine for me in Office 2016, but If you do some line mixing you end up very fast with some kind of weird errors in Excel… As promised, here is my full code:
public void Export_to_Excel(DataGridView dgv, string file_name)
{
String file_path= Environment.GetFolderPath(Environment.SpecialFolder.Desktop).ToString() + "\\" +file_name + ".xlsx";
SaveFileDialog saveFileDialog = new SaveFileDialog();
saveFileDialog.InitialDirectory = Convert.ToString(Environment.SpecialFolder.Desktop);
saveFileDialog.Filter = "Excel Workbook |*.xlsx";
saveFileDialog.Title = "Save as";
saveFileDialog.FileName = file_name;
if (saveFileDialog.ShowDialog() == DialogResult.OK)
{
file_path = saveFileDialog.FileName;
}
else
{
return;
}
using (var workbook = SpreadsheetDocument.Create(file_path, SpreadsheetDocumentType.Workbook))
{
var workbookPart = workbook.AddWorkbookPart();
workbook.WorkbookPart.Workbook = new Workbook();
workbook.WorkbookPart.Workbook.Sheets = new Sheets();
var sheetPart = workbook.WorkbookPart.AddNewPart<WorksheetPart>();
var sheetData = new SheetData();
//Autofit comes first – we calculate width of columns based on data
sheetPart.Worksheet = new Worksheet();
sheetPart.Worksheet.Append(AutoFit_Columns(dgv));
sheetPart.Worksheet.Append(sheetData);
//Adding styles to worksheet
Worksheet_Style(workbook);
Sheets sheets = workbook.WorkbookPart.Workbook.GetFirstChild<Sheets>();
string relationshipId = workbook.WorkbookPart.GetIdOfPart(sheetPart);
uint sheetId = 1;
if (sheets.Elements<Sheet>().Count() > 0)
{
sheetId = sheets.Elements<Sheet>().Select(s => s.SheetId.Value).Max() + 1;
}
Sheet sheet = new Sheet() { Id = relationshipId, SheetId = sheetId, Name = "List " + sheetId };
sheets.Append(sheet);
Row headerRow = new Row(); //Adding column headers
for (int col = 0; col < dgv.ColumnCount; col++)
{
Cell cell = new Cell
{
DataType = CellValues.String,
CellValue = new CellValue(dgv.Columns[col].HeaderText),
StyleIndex = 1// bold font
};
headerRow.AppendChild(cell);
}
// Add the row values to the excel sheet
sheetData.AppendChild(headerRow);
for (int row = 0; row < dgv.RowCount; row++)
{
Row newRow = new Row();
for (int col = 0; col < dgv.ColumnCount; col++)
{
Cell cell = new Cell();
//Checking types of data
// I had problems here with Number format, I just can't set It to a
// Datatype=CellValues.Number. If someone knows answer please let me know. However, Date format strangely works fine with Number datatype ?
// Also important – whatever format you define in creating stylesheets, you have to insert value of same kind in string here – for CellValues !
// I used cell formating as I needed, for something else just change Worksheet_Style method to your needs
if (dgv.Columns[col].ValueType == typeof(decimal)) //numbers
{
cell.DataType = new EnumValue<CellValues>(CellValues.String);
cell.CellValue = new CellValue(((decimal)dgv.Rows[row].Cells[col].Value).ToString("#,##0.00"));
cell.StyleIndex = 3;
}
else if (dgv.Columns[col].ValueType == typeof(DateTime)) //dates
{
cell.DataType = new EnumValue<CellValues>(CellValues.Number);
cell.CellValue = new CellValue(((DateTime)dgv.Rows[row].Cells[col].Value).ToOADate().ToString(CultureInfo.InvariantCulture));
cell.StyleIndex = 2;
}
Else // strings
{
cell.DataType = new EnumValue<CellValues>(CellValues.String);
cell.CellValue = new CellValue(dgv.Rows[row].Cells[col].Value.ToString());
cell.StyleIndex = 0;
}
newRow.AppendChild(cell);
}
sheetData.AppendChild(newRow);
}
}
}
private static WorkbookStylesPart Worksheet_Style (SpreadsheetDocument document)
{
WorkbookStylesPart create_style = document.WorkbookPart.AddNewPart<WorkbookStylesPart>();
Stylesheet workbookstylesheet = new Stylesheet();
DocumentFormat.OpenXml.Spreadsheet.Font font0 = new DocumentFormat.OpenXml.Spreadsheet.Font(); // Default font
FontName arial = new FontName() { Val = "Arial" };
FontSize size = new FontSize() { Val = 10 };
font0.Append(arial);
font0.Append(size);
DocumentFormat.OpenXml.Spreadsheet.Font font1 = new DocumentFormat.OpenXml.Spreadsheet.Font(); // Bold font
Bold bold = new Bold();
font1.Append(bold);
// Append both fonts
Fonts fonts = new Fonts();
fonts.Append(font0);
fonts.Append(font1);
//Append fills - a must, in my case just default
Fill fill0 = new Fill();
Fills fills = new Fills();
fills.Append(fill0);
// Append borders - a must, in my case just default
Border border0 = new Border(); // Default border
Borders borders = new Borders();
borders.Append(border0);
// CellFormats
CellFormats cellformats = new CellFormats();
CellFormat cellformat0 = new CellFormat() { FontId = 0, FillId = 0, BorderId = 0 }; // Default style : Mandatory | Style ID =0
CellFormat bolded_format = new CellFormat() { FontId = 1 }; // Style with Bold text ; Style ID = 1
CellFormat date_format = new CellFormat() { BorderId = 0, FillId = 0, FontId = 0, NumberFormatId = 14, FormatId = 0, ApplyNumberFormat = true };
CellFormat number_format = new CellFormat() { BorderId = 0, FillId = 0, FontId = 0, NumberFormatId = 4, FormatId = 0, ApplyNumberFormat = true }; // format like "#,##0.00"
cellformats.Append(cellformat0);
cellformats.Append(bolded_format);
cellformats.Append(date_format);
cellformats.Append(number_format);
// Append everyting to stylesheet - Preserve the ORDER !
workbookstylesheet.Append(fonts);
workbookstylesheet.Append(fills);
workbookstylesheet.Append(borders);
workbookstylesheet.Append(cellformats);
//Save style for finish
create_style.Stylesheet = workbookstylesheet;
create_style.Stylesheet.Save();
return create_style;
}
private Columns AutoFit_Columns(DataGridView dgv)
{
Columns cols = new Columns();
int Excel_column=0;
DataTable dt = new DataTable();
dt = (DataTable)dgv.DataSource;
for (int col = 0; col < dgv.ColumnCount; col++)
{
double max_width = 14.5f; // something like default Excel width, I'm not sure about this
//We search for longest string in each column and convert that into double to get desired width
string longest_string = dt.AsEnumerable()
.Select(row => row[col].ToString())
.OrderByDescending(st => st.Length).FirstOrDefault();
double cell_width = GetWidth(new System.Drawing.Font("Arial", 10), longest_string);
if (cell_width > max_width)
{
max_width = cell_width;
}
if (col == 0) //first column of Datagridview is index 0, but there is no 0 index of column in Excel, careful with that !!!
{
Excel_column = 1;
}
//now append column to worksheet, calculations done
Column c = new Column() { Min = Convert.ToUInt32(Excel_column), Max = Convert.ToUInt32(Excel_column), Width = max_width, CustomWidth = true };
cols.Append(c);
Excel_column++;
}
return cols;
}
private static double GetWidth(System.Drawing.Font stringFont, string text)
{
// This formula calculates width. For better desired outputs try to change 0.5M to something else
Size textSize = TextRenderer.MeasureText(text, stringFont);
double width = (double)(((textSize.Width / (double)7) * 256) - (128 / 7)) / 256;
width = (double)decimal.Round((decimal)width + 0.5M, 2);
return width;
}
Method, in my case from a .dll can be called easily like:
Export_to_Excel(my_dgv, »test_file«)
Short explanation of some stuff in code:
1.) Styles: there are many options of how I could do It, but that was the easiest way for me. When you will need something harder, try not to forget that order counts here too. And appending Fonts,Fills and Borders is neccesary.
2.) Autofit: I can't believe why that isn't documented allready, and my opinion is that OpenXML should have some method for that by default. Anyway, I solved that by using LINQ and with a help from here. I hope author doesn't mind, but someone should say that out loud :)
And now, for the end, my test results & advantages/disadvantages comparing with Interop. I tested on Excel 2016 with 200k rows of data:
Interop
Exported data in almost 3 minutes;
Advantages:
easier coding (in my opinion) with lots of built-in features, such as (ofcourse) Autofit;
you can actually create Excel file (object) that isn't saved to disk allready;
Disadvantages:
slow comparing to any other libraries such as OpenXML, though I could probably cut down 3 minutes to maybe 2;
I've also noticed huge memory drain on large data, even though I have my Interop code quite optimized;
OpenXML
Exported data (with autofit feature and all styles) in 20 seconds;
Advantages:
much faster than Interop, and I think that my »rubbish« code can be more optimized (you can help with that If you care);
Disandvantages:
coding, not obvious? :)
memory drain higher than in Interop, though OpenXML offers two approaches a.k.a. SAX or DOM method. SAX is even faster than provided code, with almost no memory drain If you write your data to Excel directly from DataReader, but coding took me a lot of time;
I hope nobody will be mad as what I actually did was to put bits and pieces from many sites into something that is actually useful, instead of writing complicated examples that nobody understands. And If anybody cares to improve anything above I would appreciate that a lot. I'm not perfect and more heads together ussually forms a better solution for everybody in the end :)
There seems to be a lot of answers to this type of question that result in an excel that asks to be repaired. I'd normally recommend people use ClosedXML, but if OpenXML is a must then the answer given here: https://stackoverflow.com/a/31829959/994679 does work.
Here's that answer with some extra lines for Date including time cells, number cells, and string cells.
private static void TestExcel()
{
using (var Spreadsheet = SpreadsheetDocument.Create("C:\\Example.xlsx", SpreadsheetDocumentType.Workbook))
{
// Create workbook.
var WorkbookPart = Spreadsheet.AddWorkbookPart();
var Workbook = WorkbookPart.Workbook = new Workbook();
// Add Stylesheet.
var WorkbookStylesPart = WorkbookPart.AddNewPart<WorkbookStylesPart>();
WorkbookStylesPart.Stylesheet = GetStylesheet();
WorkbookStylesPart.Stylesheet.Save();
// Create worksheet.
var WorksheetPart = Spreadsheet.WorkbookPart.AddNewPart<WorksheetPart>();
var Worksheet = WorksheetPart.Worksheet = new Worksheet();
// Add data to worksheet.
var SheetData = Worksheet.AppendChild(new SheetData());
SheetData.AppendChild(new Row(
//Date example. Will show as dd/MM/yyyy.
new Cell() { CellValue = new CellValue(DateTime.Today.ToOADate().ToString(CultureInfo.InvariantCulture)), StyleIndex = 1 },
//Date Time example. Will show as dd/MM/yyyy HH:mm
new Cell() { CellValue = new CellValue(DateTime.Now.ToOADate().ToString(CultureInfo.InvariantCulture)), StyleIndex = 2 },
//Number example
new Cell() { CellValue = new CellValue(123.23d.ToString(CultureInfo.InvariantCulture)), StyleIndex = 0 },
//String example
new Cell() { CellValue = new CellValue("Test string"), DataType = CellValues.String }
));
// Link worksheet to workbook.
var Sheets = Workbook.AppendChild(new Sheets());
Sheets.AppendChild(new Sheet()
{
Id = WorkbookPart.GetIdOfPart(WorksheetPart),
SheetId = (uint)(Sheets.Count() + 1),
Name = "Example"
});
Workbook.Save();
}
}
private static Stylesheet GetStylesheet()
{
var StyleSheet = new Stylesheet();
// Create "fonts" node.
var Fonts = new Fonts();
Fonts.Append(new Font()
{
FontName = new FontName() { Val = "Calibri" },
FontSize = new FontSize() { Val = 11 },
FontFamilyNumbering = new FontFamilyNumbering() { Val = 2 },
});
Fonts.Count = (uint)Fonts.ChildElements.Count;
// Create "fills" node.
var Fills = new Fills();
Fills.Append(new Fill()
{
PatternFill = new PatternFill() { PatternType = PatternValues.None }
});
Fills.Append(new Fill()
{
PatternFill = new PatternFill() { PatternType = PatternValues.Gray125 }
});
Fills.Count = (uint)Fills.ChildElements.Count;
// Create "borders" node.
var Borders = new Borders();
Borders.Append(new Border()
{
LeftBorder = new LeftBorder(),
RightBorder = new RightBorder(),
TopBorder = new TopBorder(),
BottomBorder = new BottomBorder(),
DiagonalBorder = new DiagonalBorder()
});
Borders.Count = (uint)Borders.ChildElements.Count;
// Create "cellStyleXfs" node.
var CellStyleFormats = new CellStyleFormats();
CellStyleFormats.Append(new CellFormat()
{
NumberFormatId = 0,
FontId = 0,
FillId = 0,
BorderId = 0
});
CellStyleFormats.Count = (uint)CellStyleFormats.ChildElements.Count;
// Create "cellXfs" node.
var CellFormats = new CellFormats();
// StyleIndex = 0, A default style that works for most things (But not strings? )
CellFormats.Append(new CellFormat()
{
BorderId = 0,
FillId = 0,
FontId = 0,
NumberFormatId = 0,
FormatId = 0,
ApplyNumberFormat = true
});
// StyleIndex = 1, A style that works for DateTime (just the date)
CellFormats.Append(new CellFormat()
{
BorderId = 0,
FillId = 0,
FontId = 0,
NumberFormatId = 14, //Date
FormatId = 0,
ApplyNumberFormat = true
});
// StyleIndex = 2, A style that works for DateTime (Date and Time)
CellFormats.Append(new CellFormat()
{
BorderId = 0,
FillId = 0,
FontId = 0,
NumberFormatId = 22, //Date Time
FormatId = 0,
ApplyNumberFormat = true
});
CellFormats.Count = (uint)CellFormats.ChildElements.Count;
// Create "cellStyles" node.
var CellStyles = new CellStyles();
CellStyles.Append(new CellStyle()
{
Name = "Normal",
FormatId = 0,
BuiltinId = 0
});
CellStyles.Count = (uint)CellStyles.ChildElements.Count;
// Append all nodes in order.
StyleSheet.Append(Fonts);
StyleSheet.Append(Fills);
StyleSheet.Append(Borders);
StyleSheet.Append(CellStyleFormats);
StyleSheet.Append(CellFormats);
StyleSheet.Append(CellStyles);
return StyleSheet;
}

Reading excel file in c# using Microsoft DocumentFormat.OpenXml SDK

I am using Microsoft DocumentFormat.OpenXml SDK to read data from excel file.
While doing so I am taking into consideration if a cell has blank values(If Yes, read that too).
Now, facing issues with one of the excel sheets where the workSheet.SheetDimension is null hence the code is throwing an exception.
Code used :
class OpenXMLHelper
{
// A helper function to open an Excel file using OpenXML, and return a DataTable containing all the data from one
// of the worksheets.
//
// We've had lots of problems reading in Excel data using OLEDB (eg the ACE drivers no longer being present on new servers,
// OLEDB not working due to security issues, and blatantly ignoring blank rows at the top of worksheets), so this is a more
// stable method of reading in the data.
//
public static DataTable ExcelWorksheetToDataTable(string pathFilename)
{
try
{
DataTable dt = new DataTable();
string dimensions = string.Empty;
using (FileStream fs = new FileStream(pathFilename, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (SpreadsheetDocument document = SpreadsheetDocument.Open(fs, false))
{
// Find the sheet with the supplied name, and then use that
// Sheet object to retrieve a reference to the first worksheet.
//Sheet theSheet = document.WorkbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == worksheetName).FirstOrDefault();
//--Sheet theSheet = document.WorkbookPart.Workbook.Descendants<Sheet>().FirstOrDefault();
//--if (theSheet == null)
//-- throw new Exception("Couldn't find the worksheet: "+ theSheet.Id);
// Retrieve a reference to the worksheet part.
//WorksheetPart wsPart = (WorksheetPart)(document.WorkbookPart.GetPartById(theSheet.Id));
//--WorksheetPart wsPart = (WorksheetPart)(document.WorkbookPart.GetPartById(theSheet.Id));
WorkbookPart workbookPart = document.WorkbookPart;
WorksheetPart wsPart = workbookPart.WorksheetParts.FirstOrDefault();
Worksheet workSheet = wsPart.Worksheet;
dimensions = workSheet.SheetDimension.Reference.InnerText; // Get the dimensions of this worksheet, eg "B2:F4"
int numOfColumns = 0;
int numOfRows = 0;
CalculateDataTableSize(dimensions, ref numOfColumns, ref numOfRows);
//System.Diagnostics.Trace.WriteLine(string.Format("The worksheet \"{0}\" has dimensions \"{1}\", so we need a DataTable of size {2}x{3}.", worksheetName, dimensions, numOfColumns, numOfRows));
SheetData sheetData = workSheet.GetFirstChild<SheetData>();
IEnumerable<Row> rows = sheetData.Descendants<Row>();
string[,] cellValues = new string[numOfColumns, numOfRows];
int colInx = 0;
int rowInx = 0;
string value = "";
SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
// Iterate through each row of OpenXML data, and store each cell's value in the appropriate slot in our [,] string array.
foreach (Row row in rows)
{
for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
// *DON'T* assume there's going to be one XML element for each column in each row...
Cell cell = row.Descendants<Cell>().ElementAt(i);
if (cell.CellValue == null || cell.CellReference == null)
continue; // eg when an Excel cell contains a blank string
// Convert this Excel cell's CellAddress into a 0-based offset into our array (eg "G13" -> [6, 12])
colInx = GetColumnIndexByName(cell.CellReference); // eg "C" -> 2 (0-based)
rowInx = GetRowIndexFromCellAddress(cell.CellReference) - 1; // Needs to be 0-based
// Fetch the value in this cell
value = cell.CellValue.InnerXml;
if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
{
value = stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
}
cellValues[colInx, rowInx] = value;
}
}
// Copy the array of strings into a DataTable.
// We don't (currently) make any attempt to work out which columns should be numeric, rather than string.
for (int col = 0; col < numOfColumns; col++)
{
//dt.Columns.Add("Column_" + col.ToString());
dt.Columns.Add(cellValues[col, 0]);
}
//foreach (Cell cell in rows.ElementAt(0))
//{
// dt.Columns.Add(GetCellValue(doc, cell));
//}
for (int row = 0; row < numOfRows; row++)
{
DataRow dataRow = dt.NewRow();
for (int col = 0; col < numOfColumns; col++)
{
dataRow.SetField(col, cellValues[col, row]);
}
dt.Rows.Add(dataRow);
}
dt.Rows.RemoveAt(0);
//#if DEBUG
// // Write out the contents of our DataTable to the Output window (for debugging)
// string str = "";
// for (rowInx = 0; rowInx < maxNumOfRows; rowInx++)
// {
// for (colInx = 0; colInx < maxNumOfColumns; colInx++)
// {
// object val = dt.Rows[rowInx].ItemArray[colInx];
// str += (val == null) ? "" : val.ToString();
// str += "\t";
// }
// str += "\n";
// }
// System.Diagnostics.Trace.WriteLine(str);
//#endif
return dt;
}
}
}
catch (Exception ex)
{
return null;
}
}
public static void CalculateDataTableSize(string dimensions, ref int numOfColumns, ref int numOfRows)
{
// How many columns & rows of data does this Worksheet contain ?
// We'll read in the Dimensions string from the Excel file, and calculate the size based on that.
// eg "B1:F4" -> we'll need 6 columns and 4 rows.
//
// (We deliberately ignore the top-left cell address, and just use the bottom-right cell address.)
try
{
string[] parts = dimensions.Split(':'); // eg "B1:F4"
if (parts.Length != 2)
throw new Exception("Couldn't find exactly *two* CellAddresses in the dimension");
numOfColumns = 1 + GetColumnIndexByName(parts[1]); // A=1, B=2, C=3 (1-based value), so F4 would return 6 columns
numOfRows = GetRowIndexFromCellAddress(parts[1]);
}
catch
{
throw new Exception("Could not calculate maximum DataTable size from the worksheet dimension: " + dimensions);
}
}
public static int GetRowIndexFromCellAddress(string cellAddress)
{
// Convert an Excel CellReference column into a 1-based row index
// eg "D42" -> 42
// "F123" -> 123
string rowNumber = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^0-9 _]", "");
return int.Parse(rowNumber);
}
public static int GetColumnIndexByName(string cellAddress)
{
// Convert an Excel CellReference column into a 0-based column index
// eg "D42" -> 3
// "F123" -> 5
var columnName = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^A-Z_]", "");
int number = 0, pow = 1;
for (int i = columnName.Length - 1; i >= 0; i--)
{
number += (columnName[i] - 'A' + 1) * pow;
pow *= 26;
}
return number - 1;
}
}[enter image description here][1]
The SheetDimension part is optional (and therefor you cannot always rely on it being up to date). See the following part of the OpenXML specification:
18.3.1.35 dimension (Worksheet Dimensions)
This element specifies the used range of the worksheet. It specifies the row and column bounds of
used cells in the worksheet. This is optional and is not required.
Used cells include cells with formulas, text content, and cell
formatting. When an entire column is formatted, only the first cell in
that column is considered used.
So an Excel file without any SheetDimension part is perfectly valid, so you should not rely on it being present in an Excel file.
Therefor I'd suggest to simply parse all Row elements contained in the SheetData part, and "count" the number of rows (instead of reading the SheetDimensions part to get the number of rows / columns). This way you can also take into account that an Excel file may contain completely blank rows in-between the data.

Having trouble reading excel file with the OpenXML sdk

I have a function that reads from an excel file and stores the results in a DataSet. I have another function that writes to an excel file. When I try to read from a regular human-generated excel file, the excel reading function returns a blank DataSet, but when I read from the excel file generated by the writing function, it works perfectly fine. The function then will not work on a regular generated excel file, even when I just copy and paste the contents of the function generated excel file. I finally tracked it down to this, but I have no idea where to go from here. Is there something wrong with my code?
Here is the excel generating function:
public static Boolean writeToExcel(string fileName, DataSet data)
{
Boolean answer = false;
using (SpreadsheetDocument excelDoc = SpreadsheetDocument.Create(tempPath + fileName, SpreadsheetDocumentType.Workbook))
{
WorkbookPart workbookPart = excelDoc.AddWorkbookPart();
workbookPart.Workbook = new Workbook();
WorksheetPart worksheetPart = workbookPart.AddNewPart<WorksheetPart>();
Sheets sheets = excelDoc.WorkbookPart.Workbook.AppendChild<Sheets>(new Sheets());
Sheet sheet = new Sheet()
{
Id = excelDoc.WorkbookPart.GetIdOfPart(worksheetPart),
SheetId = 1,
Name = "Page1"
};
sheets.Append(sheet);
CreateWorkSheet(worksheetPart, data);
answer = true;
}
return answer;
}
private static void CreateWorkSheet(WorksheetPart worksheetPart, DataSet data)
{
Worksheet worksheet = new Worksheet();
SheetData sheetData = new SheetData();
UInt32Value currRowIndex = 1U;
int colIndex = 0;
Row excelRow;
DataTable table = data.Tables[0];
for (int rowIndex = -1; rowIndex < table.Rows.Count; rowIndex++)
{
excelRow = new Row();
excelRow.RowIndex = currRowIndex++;
for (colIndex = 0; colIndex < table.Columns.Count; colIndex++)
{
Cell cell = new Cell()
{
CellReference = Convert.ToString(Convert.ToChar(65 + colIndex)),
DataType = CellValues.String
};
CellValue cellValue = new CellValue();
if (rowIndex == -1)
{
cellValue.Text = table.Columns[colIndex].ColumnName.ToString();
}
else
{
cellValue.Text = (table.Rows[rowIndex].ItemArray[colIndex].ToString() != "") ? table.Rows[rowIndex].ItemArray[colIndex].ToString() : "*";
}
cell.Append(cellValue);
excelRow.Append(cell);
}
sheetData.Append(excelRow);
}
SheetFormatProperties formattingProps = new SheetFormatProperties()
{
DefaultColumnWidth = 20D,
DefaultRowHeight = 20D
};
worksheet.Append(formattingProps);
worksheet.Append(sheetData);
worksheetPart.Worksheet = worksheet;
}
while the reading function is as following:
public static void readInventoryExcel(string fileName, ref DataSet set)
{
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(fileName, false))
{
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
int count = -1;
foreach (Row r in sheetData.Elements<Row>())
{
if (count >= 0)
{
DataRow row = set.Tables[0].NewRow();
row["SerialNumber"] = r.ChildElements[1].InnerXml;
row["PartNumber"] = r.ChildElements[2].InnerXml;
row["EntryDate"] = r.ChildElements[3].InnerXml;
row["RetirementDate"] = r.ChildElements[4].InnerXml;
row["ReasonForReplacement"] = r.ChildElements[5].InnerXml;
row["RetirementTech"] = r.ChildElements[6].InnerXml;
row["IncludeInMaintenance"] = r.ChildElements[7].InnerXml;
row["MaintenanceTech"] = r.ChildElements[8].InnerXml;
row["Comment"] = r.ChildElements[9].InnerXml;
row["Station"] = r.ChildElements[10].InnerXml;
row["LocationStatus"] = r.ChildElements[11].InnerXml;
row["AssetName"] = r.ChildElements[12].InnerXml;
row["InventoryType"] = r.ChildElements[13].InnerXml;
row["Description"] = r.ChildElements[14].InnerXml;
set.Tables[0].Rows.Add(row);
}
count++;
}
}
}
I think this is caused by the fact that you have only one sheet whereas Excel has three. I'm not certain but I think the sheets are returned in reverse order so you should change the line:
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
to
WorksheetPart worksheetPart = workbookPart.WorksheetParts.Last();
It might be safer to search for the WorksheetPart if you can identify it by the sheet name. You need to find the Sheet first then use the Id of that to find the SheetPart:
private WorksheetPart GetWorksheetPartBySheetName(WorkbookPart workbookPart, string sheetName)
{
//find the sheet first.
IEnumerable<Sheet> sheets = workbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>().Where(s => s.Name == sheetName);
if (sheets.Count() > 0)
{
string relationshipId = sheets.First().Id.Value;
WorksheetPart worksheetPart = (WorksheetPart)workbookPart.GetPartById(relationshipId);
return worksheetPart;
}
return null;
}
You can then use:
WorksheetPart worksheetPart = GetWorksheetPartBySheetName(workbookPart, "Sheet1");
There are a couple of other things I've noticed whilst looking at your code which you may (or may not!) be interested in:
In your code you are only reading the InnerXml so it might not matter to you but the way Excel stores strings is different to the way you are writing them so reading an Excel generated file may not give you the values you expect. In your example you are writing the string directly to the cell like this:
But Excel uses a SharedStrings concept where all strings are written to a separate XML file called sharedStrings.xml. That file contains the strings used in the Excel file with a reference and it's that value that is stored in the cell value in the sheet XML.
The sharedString.xml looks like this:
And the Cell then looks like this:
The 47 in the <v> element is a reference to the 47th shared string. Note that the type (the t attribute) in your generated XML is str but the type in the Excel generated file is s. This denotes yours is an inline string and theirs is a shared string.
You can read the SharedStrings just as you would any other part:
var stringTable = workbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
if (stringTable != null)
{
sharedString = stringTable.SharedStringTable.ElementAt(int.Parse(value)).InnerText;
}
Secondly, if you look at the cell reference that your code generates and the cell reference that Excel generates you can see you are only outputting the column and not the row (e.g. you output A instead of A1). To fix this you should change the line:
CellReference = Convert.ToString(Convert.ToChar(65 + colIndex)),
to
CellReference = Convert.ToString(Convert.ToChar(65 + colIndex) + rowIndex.ToString()),
I hope that helps.
I ran into a similar issue a while back trying to do this for Word documents (procedurally generated worked fine, but human-generated did not). I found this tool to be very helpful:
http://www.microsoft.com/en-us/download/details.aspx?id=30425
Basically, it looks at a file and shows you the code that Microsoft would generate to read it, as well as the xml structure of the file itself. As usual for Microsoft products, there are quite a few menus and it's not very intuitive, but after clicking around for a bit you will be able to see exactly what is going on with any two files. I would recommend you open a working excel file and a non-working one and compare the difference to see what's causing your issue.
Below is the OpenXML code that I use to read in a particular Worksheet from an Excel file, into a DataTable.
First, here's how you'd call it:
DataTable dt = OpenXMLHelper.ExcelWorksheetToDataTable("C:\\SQL Server\\SomeExcelFile.xlsx", "Mikes Worksheet");
And here's the code:
public class OpenXMLHelper
{
public static DataTable ExcelWorksheetToDataTable(string pathFilename, string worksheetName)
{
DataTable dt = new DataTable(worksheetName);
using (SpreadsheetDocument document = SpreadsheetDocument.Open(pathFilename, false))
{
// Find the sheet with the supplied name, and then use that
// Sheet object to retrieve a reference to the first worksheet.
Sheet theSheet = document.WorkbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == worksheetName).FirstOrDefault();
if (theSheet == null)
throw new Exception("Couldn't find the worksheet: " + worksheetName);
// Retrieve a reference to the worksheet part.
WorksheetPart wsPart = (WorksheetPart)(document.WorkbookPart.GetPartById(theSheet.Id));
Worksheet workSheet = wsPart.Worksheet;
string dimensions = workSheet.SheetDimension.Reference.InnerText; // Get the dimensions of this worksheet, eg "B2:F4"
int numOfColumns = 0;
int numOfRows = 0;
CalculateDataTableSize(dimensions, ref numOfColumns, ref numOfRows);
System.Diagnostics.Trace.WriteLine(string.Format("The worksheet \"{0}\" has dimensions \"{1}\", so we need a DataTable of size {2}x{3}.", worksheetName, dimensions, numOfColumns, numOfRows));
SheetData sheetData = workSheet.GetFirstChild<SheetData>();
IEnumerable<Row> rows = sheetData.Descendants<Row>();
string[,] cellValues = new string[numOfColumns, numOfRows];
int colInx = 0;
int rowInx = 0;
string value = "";
SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
// Iterate through each row of OpenXML data
foreach (Row row in rows)
{
for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
// *DON'T* assume there's going to be one XML element for each item in each row...
Cell cell = row.Descendants<Cell>().ElementAt(i);
if (cell.CellValue == null || cell.CellReference == null)
continue; // eg when an Excel cell contains a blank string
// Convert this Excel cell's CellAddress into a 0-based offset into our array (eg "G13" -> [6, 12])
colInx = GetColumnIndexByName(cell.CellReference); // eg "C" -> 2 (0-based)
rowInx = GetRowIndexFromCellAddress(cell.CellReference)-1; // Needs to be 0-based
// Fetch the value in this cell
value = cell.CellValue.InnerXml;
if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
{
value = stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
}
cellValues[colInx, rowInx] = value;
}
dt.Rows.Add(dataRow);
}
// Copy the array of strings into a DataTable
for (int col = 0; col < numOfColumns; col++)
dt.Columns.Add("Column_" + col.ToString());
for (int row = 0; row < numOfRows; row++)
{
DataRow dataRow = dt.NewRow();
for (int col = 0; col < numOfColumns; col++)
{
dataRow.SetField(col, cellValues[col, row]);
}
dt.Rows.Add(dataRow);
}
#if DEBUG
// Write out the contents of our DataTable to the Output window (for debugging)
string str = "";
for (rowInx = 0; rowInx < maxNumOfRows; rowInx++)
{
for (colInx = 0; colInx < maxNumOfColumns; colInx++)
{
object val = dt.Rows[rowInx].ItemArray[colInx];
str += (val == null) ? "" : val.ToString();
str += "\t";
}
str += "\n";
}
System.Diagnostics.Trace.WriteLine(str);
#endif
return dt;
}
}
private static void CalculateDataTableSize(string dimensions, ref int numOfColumns, ref int numOfRows)
{
// How many columns & rows of data does this Worksheet contain ?
// We'll read in the Dimensions string from the Excel file, and calculate the size based on that.
// eg "B1:F4" -> we'll need 6 columns and 4 rows.
//
// (We deliberately ignore the top-left cell address, and just use the bottom-right cell address.)
try
{
string[] parts = dimensions.Split(':'); // eg "B1:F4"
if (parts.Length != 2)
throw new Exception("Couldn't find exactly *two* CellAddresses in the dimension");
numOfColumns = 1 + GetColumnIndexByName(parts[1]); // A=1, B=2, C=3 (1-based value), so F4 would return 6 columns
numOfRows = GetRowIndexFromCellAddress(parts[1]);
}
catch
{
throw new Exception("Could not calculate maximum DataTable size from the worksheet dimension: " + dimensions);
}
}
public static int GetRowIndexFromCellAddress(string cellAddress)
{
// Convert an Excel CellReference column into a 1-based row index
// eg "D42" -> 42
// "F123" -> 123
string rowNumber = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^0-9 _]", "");
return int.Parse(rowNumber);
}
public static int GetColumnIndexByName(string cellAddress)
{
// Convert an Excel CellReference column into a 0-based column index
// eg "D42" -> 3
// "F123" -> 5
var columnName = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^A-Z_]", "");
int number = 0, pow = 1;
for (int i = columnName.Length - 1; i >= 0; i--)
{
number += (columnName[i] - 'A' + 1) * pow;
pow *= 26;
}
return number - 1;
}
}
Just to mention, some of our company's Excel Worksheets have one or more blank rows at the top. Strangely, this prevented some other OpenXML libraries from reading in such Worksheets properly.
This code deliberately creates a DataTable with one value for each of the cells in the Worksheet, even the blank ones at the top.

Categories

Resources