c# wpf importing excel performance - c#

I'm importing a large excel file that can vary in length (250+ columns * 100,000 rows), it holds columns of data, where amount of columns and their names can change, rows are also variable but they are the values.
I'm using Interop to pull the data into a datatable which is bound to a datagrid, however I'm importing each row individually and it can take 25+ minutes to complete for larger files.
public Task<DataTable> ParseExcel(string filePath)
{
return Task.Run(() =>
{
var excelApp = new Microsoft.Office.Interop.Excel.Application();
var excelBook = excelApp.Workbooks.Open(filePath, 0, true, 5, "", "", true,
Microsoft.Office.Interop.Excel.XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);
var excelSheet = (Microsoft.Office.Interop.Excel.Worksheet)excelBook.Worksheets.Item[1];
Microsoft.Office.Interop.Excel.Range excelRange = excelSheet.UsedRange;
DataTable sessiondt = new DataTable();
object[,] value = excelRange.Value;
int columnsCount = value.GetLength(1);
for (var colCnt = 1; colCnt <= columnsCount; colCnt++)
{
sessiondt.Columns.Add((string)value[1, colCnt], typeof(string));
}
int rowsCount = value.GetLength(0);
for (var rowCnt = 2; rowCnt <= rowsCount; rowCnt++)
{
var dataRow = sessiondt.NewRow();
for (var colCnt = 1; colCnt <= columnsCount; colCnt++)
{
dataRow[colCnt - 1] = value[rowCnt, colCnt];
}
sessiondt.Rows.Add(dataRow);
}
excelBook.Close(true);
excelApp.Quit();
return sessiondt;
});
}
Rather than inserting each row individually, it would probably be faster to put it all into a List of a custom object that can be data bound. But I'm unsure of how to do this.
Also, I want to bind the columns in a way which i don't have to code in the column names in advance. I'll be trying to display these in graphs and being able to populate the column names into a Combobox automatically would be allot easier.
Thank you in advance, I am new to c# and wpf and still learning.

Interop has some specific uses, but if you just want to get the data from an Excel file, Interop is probably the slowest and most cumbersome way to go.
An Excel file, either .xls or .xlsx can be treated and accessed just like a database.
As long as you have data in rows and columns in your worksheets, you can open an OleDb connection to it and run queries against it.
The Sheet names take the place of the table name and if you have column headings in the first row of your sheet, those are the field names.
You just need the proper connection string:
https://www.connectionstrings.com/excel/
One of the 'watch-outs' with this method of retrieving your data is that data types are automatically assigned based on the first few entries in each column. You cannot override this behavior (you used to be able to, but no longer). A time where this might cause a problem is if you have alpha-numerics in a column, and the first dozen or so entries are all numbers. This column will then be automatically assigned as a numeric type. If you have data in later rows of this column that are mixed alpha-numeric or straight text, these entries will be ignored (not imported) because they don't match the data type that was initially assigned.
The only good way around that is to programmatically unzip and parse out the contents of the xml files.
If you have consistent data throughout, then this isn't an issue.

Here is another way how you can do this, fast and straightforward, by using GemBox.Spreadsheet library:
public Task<DataTable> ParseExcel(string filePath)
{
return Task.Run(() =>
{
ExcelFile excelBook = ExcelFile.Load(filePath);
ExcelWorksheet excelSheet = excelBook.Worksheets[0];
CreateDataTableOptions options = new CreateDataTableOptions();
return excelSheet.CreateDataTable(options);
});
}
Also check this DataTable from Sheet example.

Related

How to add a new excel column between two columns in a existing worksheet

I would like to add a column that already contains cells values between two columns (or at the end) of a worksheet of an existing workbook that I load.
So I have a function that sets that "column values" I need :
private static Workbook SetIndicatorsWorkbook()
{
var workbook = new Workbook(WorkbookFormat.Excel2007MacroEnabled);
var worksheet = workbook.Worksheets.Add("Unit & Integration Tests");
//Don't worry about team and jenkinsBuilTeams variables
foreach (var team in jenkinsBuildTeams)
{
worksheet.Rows[posX].Cells[0].Value = lastnbUnitTests + lastnbIntegrationTests;
posX += 1;
}
return workbook;
}
And then in main function I want to add this column (which is workbook.worksheets[0].Columns[0] ) in a loaded workbook :
private static void Main()
{
//The workbook I need to update
Workbook workbook = Workbook.Load("file.xlsx");
Workbook temp = SetIndicatorsWorkbook();
WorksheetColumn wc = temp.Worksheets[0].Columns[0];
//The issue is that Worksheet's Columns collection has no "Insert" property
workbook.Save("file.xlsx");
}
The Columns collection of the Worksheet has an Insert method that will shift data/formatting just as would happen in Excel. This was added in the 2014 volume 2 version. You can read more about that in the help topic or the api documentation. Note I've linked to the WPF version help but the Insert method is available in the other platforms as well.

Adding datatype to table columns with ClosedXML

I'm using ClosedXML to create an xlsx-file with some data. In the Excel-sheet I have a table with data. In the columns of the table (not the column of the sheet) I want to specify datatypes.
/* This datatable is created and populated somewhere else in my program*/
Datatable dt = new Datatable();
ws.Cell(t.Position.Row, t.Position.Column).InsertTable(dt);
IXLTables allTables = ws.Tables;
var table = allTables.ElementAt(i);
int j = 1;
/* The options object hold the excel datatypes for each column*/
foreach (var c in option.Tables.ElementAt(i).Columns)
{
table.Column(j).DataType = c.Type;
j++;
}
i++;
The datatype is found and added in the foreach-loop from an options object, how this works in my program is probably not important.
The problem is that when I add a datatype to a column in the table it includes the header of the table. Since the header is "Text" and the value I might specified is Number I get an error. Anyone got an idea how to make it ignore the headerline of the table and simply add datatype to the columns below?
Thanks in advance.
Since setting the data type can work on a per column basis in Excel it's not unreasonable to expect the same from ClosedXml. As you've found it doesn't work like that however. The way I do it is to define a range that selects the whole column except the header. I've not worked with tables the way you are but the following snippet should give you a direction.
IXLWorksheet sheet = wb.Worksheets.Add("Sheet1");
sheet.Cell(1, 1).InsertTable(dt);
foreach (var column in sheet.ColumnsUsed())
{
string columnLetter = column.ColumnLetter();
string rng = $"${columnLetter}2:columnLetter}sheet.RangeUsed().RowCount()}";
sheet.Range(rng).DataType = some data type;
}

Open XML populate excel table

I have excel template with empty one-column table. I need to populate it with some string values (this is needed for setting lookups using data validation, but I guess it doesn't really matter)
I came up to getting Table object and I assume I should use Append method
var workBookPart = doc.WorkbookPart;
var lookupsSheet = (Sheet)workBookPart.Workbook.Sheets.FirstOrDefault(x => (x is Sheet && ((Sheet)x).Name == "Lookups"));
var worksheetPart = (WorksheetPart)workBookPart.GetPartById(lookupsSheet.Id);
var table = worksheetPart.TableDefinitionParts.FirstOrDefault(x => x.Table.DisplayName == "ValuesTable")?.Table;
Can someone enlighten about the correct way of adding rows to such table. Thanks!
I would suggest you use the ClosedXML library to set the values of cells in your worksheet. By using ClosedXML, you will be able to populate the cells you want in the following fashion:
var workbook = new XLWorkbook();
var ws = workbook.Worksheets.Add("Demo");
// Set the values for the cells
ws.Cell(1, 1).Value = "Value";
ws.Cell(2, 1).Value = 1;
ws.Cell(3, 1).Value = 2;
ws.Cell(4, 1).Value = 3;
ws.Cell(5, 1).Value = true;
Note that you can set the value of a cell to a string, an integer, and a boolean without doing any explicit casting. You can set the value of a cell without doing any explicit casting to other types as well, as it is explained in the following link: Cell Values.
For more information regarding the ClosedXML library please refer to the documentation.
As a side note, I was really eager to use Open XML to manipulate Excel spreadsheets but I found ClosedXML way easier to use.

Update Excel From C# Based on Column Name

I am looking for a way to update an existing Excel spreadsheet with data from a SQL query via C# putting data in columns based on column header. For instance, if I have the following returned query/dataset
Width Height Length
2 2 2
2 3 4
3 4 5
And I have an Excel workbook like so:
Width Height Area Length Volume
=(A1*B1) =(C1*D1)
=(A2*B2) =(C2*D2)
=(A3*B3) =(C3*D3)
I would like to insert Width, Length and Height into the workbook without affecting Area or Volume, i.e.:
Width Height Area Length Volume
2 2 =(A1*B1) 2 =(C1*D1)
2 3 =(A2*B2) 4 =(C2*D2)
3 4 =(A3*B3) 5 =(C3*D3)
Is there a way to specify in code that the Width from the dataset should go in the Width column, etc.? I am currently using the EPPlus package to do Excel tasks.
A couple of approaches for this
1. You can hard-code the Excel column name's index
2. You can resolve it and put it in a dictionary
I'm going to go with option 2 so it's easier for you. However a couple of assumptions.
You know how to get the Worksheet property of your application through Interop.Excel
You are able to specify the row where you start entering the data, and row of where all the column names are
Here's the code
using Microsoft.Office.Interop.Excel;
public void SyncData(Worksheet ws, DataTable dt, int startRow){
//Get the columns and their corresponding indexes in excel
Dictionary<string, int> columnMap = ExcelColumnResolver(ws, dt, 1);
//The row number in excel youre starting to update from
int currRow = startRow;
//Iterate through the rows and the columns of each row
foreach(DataRow row in dt.Rows){
foreach(DataColumn column in dt.Columns){
//Only update columns we have mapped
if(columnMap.ContainsKey(column.ColumnName)){
ws.Cells[currRow, columnMap[column.ColumnName]] = row[column.ColumnName];
}
}
currRow++;
}
}
//columnsRow = Row in which the column names are located (non-zero indexed)
public Dictionary <string, int> ExcelColumnResolver(Worksheet ws, DataTable dt, int columnsRow) {
Dictionary<string, int> nameToExcelIdxMap = new Dictionary<string, int>();
//The row in Excel that your column names are located
int maxColumnCount = 10;
//Excel cells start at index 1
for (int i = 1; i < maxColumnCount; i++) {
string col = ws.Cells[columnsRow, i].ToString();
if (dt.Columns.Contains(col)){
nameToExcelIdxMap[col] = i;
}
}
return nameToExcelIdxMap;
}
Here's a tutorial on how you can access the Excel worksheet
Runtime is O(n^2), but for performance I would recommend:
Populating the data in an object array and using the Worksheet.Range
property to set a group of cells, instead of individually updating
the cells 1 by 1.
Parallel the writing of rows to the object
array, since there are no dependencies between the rows
Using EPPlus and assuming GetDataFromSql returns DataTable, you can use the following code:
var data = GetDataFromSql();
using (var excelPackage = new ExcelPackage(new FileInfo(#"C:\Proj\Sample\Book1.xlsx")))
{
var worksheet = excelPackage.Workbook.Worksheets.First();
// Get locations of column names inside excel:
var headersLocation = new Dictionary<string, Tuple<int, int>>();
foreach (DataColumn col in data.Columns)
{
var cell = worksheet.Cells.First(x => x.Text.Equals(col.ColumnName));
headersLocation.Add(col.ColumnName, new Tuple<int, int>(cell.Start.Row, cell.Start.Column));
}
for (var i = 0; i < data.Rows.Count; i++)
{
foreach (DataColumn col in data.Columns)
{
// update the value
worksheet.Cells[headersLocation[col.ColumnName].Item1 + i + 1,
headersLocation[col.ColumnName].Item2
].Value = data.Rows[i][col];
}
}
excelPackage.Save();
}

Select range in aspose

Do you know an equivalent to VBA code:
Range(Selection, Selection.End(xlToRight)).Select
In Aspose.Cells. It seems that its only possible to select the last cell in the entire row:
public Aspose.Cells.Cell EndCellInRow ( Int32 rowIndex )
Or the last cell on the right within a range:
public Aspose.Cells.Cell EndCellInRow ( Int32 startRow, Int32 endRow, Int32 startColumn, Int32 endColumn )
but then you must know more or less how big your table is going to be.
I found this from 2009: http://www.aspose.com/community/forums/permalink/196519/196405/showthread.aspx but that will not resolve my problem as I may have many tables in a sheet both horizontally and vertiacally. And I can't predict where they are going to be.
Edit1:
Sorry if this is dumb question, but ctrl+shift+arrow is such a common operation that I can't believe it would be not implemented so I'm making sure I really have to re-invent the wheel.
Aspose.Cells provides the list of tables in a worksheet using property named 'Worksheet.ListObjects'. 'ListObjects' is a colloection of 'ListObject' type which represents a Table in an excel sheet. That means if one has more than one Tables in a worksheet, the ListObjects collection will give access to every table in the worksheet very conveniently. Each 'ListObject' in turn contains a property named 'DataRange' which specifies all the cells inside a Table. For the sake of convenience DataRange can be used for following operations on a Table:
To apply styles/formatting on the cells in Table
To get the data values
Merge or move the cells in Range
Export contents
To get enumerator to traverse through Table cells
To make selection of cells from DataRange, you can traverse using DataRange to get all the cells in a Row (This could also be done for a column)
Applying any operation on Table cells like after selecting cells using Ctrl+Shift+Arrow, could be performed using a workbook object as follows:
Workbook workbook = new Workbook(new FileStream("book1.xls", FileMode.Open));
if (workbook.Worksheets[0].ListObjects.Count > 0)
{
foreach (ListObject table in workbook.Worksheets[0].ListObjects)
{
Style st = new Style();
st.BackgroundColor = System.Drawing.Color.Aqua;
st.ForegroundColor = System.Drawing.Color.Black;
st.Font.Name = "Agency FB";
st.Font.Size = 16;
st.Font.Color = System.Drawing.Color.DarkRed;
StyleFlag stFlag = new StyleFlag();
stFlag.All = true;
table.DataRange.ApplyStyle(st, stFlag);
}
}
workbook.Save("output.xls");
There is also some worthy information available in Aspose docs about Table styles and applying formatting on a ListObject. For getting last Table cell in a certain row or column, I am sure this will help:
int iFirstRowIndex = table.DataRange.FirstRow;
int iFirstColumnIndex = table.DataRange.FirstColumn;
int iLastRowIndex = table.DataRange.RowCount + iFirstRowIndex;
int iLastColumnIndex = table.DataRange.ColumnCount + iFirstColumnIndex;
for (int rowIndex = 0; rowIndex < table.DataRange.RowCount; rowIndex++)
{
//Get last cell in every row of table
Cell cell = worksheet.Cells.EndCellInColumn(rowIndex + iFirstRowIndex, rowIndex + iFirstRowIndex, (short)iFirstColumnIndex, (short)(iLastColumnIndex - 1));
//display cell value
System.Console.WriteLine(cell.Value);
}

Categories

Resources