Read Excel Cell Format - c#

I'm working on this program that will read the data in excel file and put it into our database. The program is written in Visual Studio 2010 using C#, and I'm using the NPOI library.
In the past, I was able to read the spreadsheet row by row and cell by cell to get the data, but the new format of the excel file will not allow me to do this easily. (The excel is given by another user, so I can't really make big changes to it).
There are several "tables" in one sheet (using borders and headers for each column name), and I will need to get data mainly from the tables but sometimes outside the tables too.
I was wondering if I were to read the spreadsheet row by row (which is what I'm a bit for familiar with), is there a way I can tell that I have reached a table? Is there a way I can read the "format" of the cell?
What I mean is, for example, "this cell has borders around it so starting this row is a table." or "the text in this cell is bold, so this row is the header row for this new table."
In the past I was only able to read the "text" for the spreadsheet and not the format/style. I've been searching on the internet and I can only find how to set the style for output excel but not how to read the format from input.
Any help is appreciated, thanks!

It would be better to have the various tables in your source workbook defined as named ranges with known names. Then you can get the associated area like this -
using System.IO;
using System.Windows;
using NPOI.SS.UserModel;
using NPOI.XSSF.UserModel;
// ...
using (var file = new FileStream(workbookLocation, FileMode.Open, FileAccess.Read))
{
var workbook = new XSSFWorkbook(file);
var nameInfo = workbook.GetName("TheTable");
var tableRange = nameInfo.RefersToFormula;
// Do stuff with the table
}
If you have no control over the source spreadsheet and cannot define the tables as named ranges, you can read the cell formats as you suggest. Here is an example of reading the TopBorder style -
using (var file = new FileStream(workbookLocation, FileMode.Open, FileAccess.Read))
{
var workbook = new XSSFWorkbook(file);
var sheet = workbook.GetSheetAt(0);
for (int rowNo = 0; rowNo <= sheet.LastRowNum; rowNo++)
{
var row = sheet.GetRow(rowNo);
if (row == null) // null is when the row only contains empty cells
continue;
for (int cellNo = 0; cellNo <= row.LastCellNum; cellNo++)
{
var cell = row.GetCell(cellNo);
if (cell == null) // null is when the cell is empty
continue;
var topBorderStyle = cell.CellStyle.BorderTop;
if (topBorderStyle != BorderStyle.None)
{
MessageBox.Show(string.Format("Cell row: {0} column: {1} has TopBorder: {2}", cell.Row.RowNum, cell.ColumnIndex, topBorderStyle));
}
}
}
}

Related

c# read specific columns from excel file

i have a tables like this
and i added checkboxs elements to form like this
i want to add the checkbox element text to datagridview then read the checked columns from excel file
if Date, Time, Price are checked datagridview will be like this
then get full Date column from excel file and add it to Date column in datagrid
my code to add checked boxes text as a columns in datagridview
DataTable dt = new DataTable();
foreach (Control checkbox in pnl.Controls)
if (checkbox.GetType() == typeof(CheckBox) && ((CheckBox) checkbox).Checked)
{
string txt = ((CheckBox)checkbox).Text;
dt.Columns.Add(new DataColumn(txt, typeof(object)));
}
datagrid.DataSource = dt;
There are a few steps which are needed before being able to grab data from an Excel file. So without your code, I don't know how much of this you have done. But here is the full explanation.
First
You have to add a reference to the Microsoft.Office.Interop.Excel dll (this assumes you aren't using epplus or another Nuget package). This link describes how to do this: How to reference Microsoft.Office.Interop.Excel dll?
Second
Include this library in whichever source file it is needed, and initialize an excel application (you'll also want InteropServices included):
using Microsoft.Office.Interop.Excel;
using System.Runtime.InteropServices;
// Global excel app object to be used anywhere
public Application ExcelApp;
// Intitializes an excel application by looking for an active one,
// and creating a new one if none are active
public void InitExcelApp()
{
try
{
ExcelApp = (Application)Marshal.GetActiveObject("Excel.Application")
}
catch(COMException ex)
{
ExcelApp = new Application
{
Visible = true
};
}
}
Third
You must initialize a workbook object. Here is how I do it, but my solution assumes you know the path to the desired Excel workbook:
Workbook myWorkbook = null;
// Checks open workbooks first
// Note that the path must be windows style. Ex: "C:\\Desktop\\myWorkbook.xlsx"
foreach (Workbook openWorkbook in ExcelApp.Workbooks)
{
if (openWorkbook.FullName == "<path to workbook>")
{
myWorkbook = openWorkbook;
}
}
// If no open workbooks were found at the known path, try opening one
if(myWorkbook is null)
{
myWorkbook = Excelapp.Workbooks.Open("<path to workbook>", Editable: true);
}
Fourth
Get the data you want. There are several ways to do this, and mine might not be the most efficient, but it works. In the code below I have included parameter names to hopefully make it more understandable.
// Gets the names of the checked items from the DataTable with columns you already added
// You could get these names from your checkboxes' names if you preferred
List<string> checkItems = new List<string>();
foreach (System.Data.DataColumn column in dt.Columns)
{
checkItems.Add(column.ColumnName);
}
// This dictionary holds info helpful for getting the correct data from excel:
// Key: a string containing the name of the column header, i.e. Date, Time, Price, etc.
// Value: an integer containing the number of that column in excel
Dictionary<string, int> excelColumnsInfo = new Dictionary<string, int>();
for (int columnNum = 1; columnNum <= myWorkbook.UsedRange.Columns.Count; columnNum++)
{
string columnHeader = myWorkbook.Cells[RowIndex: 1, ColumnIndex: columnNum].Value2.ToString();
if (checkedItems.Contains(columnHeader))
{
excelColumnsInfo.add(columnHeader, columnNum);
}
}
// Populates the data table with the data you need
// Start at row 2 to ignore the excel sheet's column headers
for (int rowNum = 2; rowNum <= myWorkbook.UsedRange.Rows.Count; rowNum++)
{
System.data.dataRow newRow = dt.NewRow();
foreach (KeyValuePair<string, int> columnInfo in excelColumnsInfo)
{
newRow[columnName: columnInfo.Key] = myWorkbook.Cells[RowIndex: rowNum, ColumnIndex: columnInfo.Value].Value2.ToString();
}
dt.Rows.Add(newRow);
}

How to get the address of a cell

I'm trying to find the address of a cell in a xlsx file using c#, but i can't find a right solution for it.
using IronXL;
var workbook = IronXL.WorkBook.Load("email list.xlsx");
var sheet = workbook.WorkSheets.First();
var cells = sheet["A1:C494"];
foreach(var cell in cells)
{
Console.WriteLine(cell.Value);
//print cell adress
}
Thank you for your time

Updating a cell value breaks row style. NPOI, C#

Good evening, recently i was trying to update cell's value in .xls file, using NPOI library(C#), but, when i do that with cell.SetCellValue("anyvalue");,
I am able to see the changes only in some cells. Other cell are just empty.
Tried to save cell's style and re-write it using cell.CellStyle, but still the same.
Generally speaking, i get only half of the values that have to be filled in places.
Using that code, where nameAndValues[0] contains cell name, and nameAndValues[1] contains its value.
using (FileStream rstr = new FileStream(currentPath + $"/{excelName}", FileMode.Open, FileAccess.Read))
{
var workbook = new HSSFWorkbook(rstr);
var sheet = workbook.GetSheetAt(0);
using (FileStream wstr = new FileStream(currentPath + $"/{excelName}", FileMode.Open, FileAccess.Write))
{
for (int i = 0; i < values.Count; i++)
{
var cr = new CellReference(namesAndValue[i, 0]);
var row = sheet.CreateRow(cr.Row);
var cell = row.CreateCell(cr.Col);
cell.SetCellValue(namesAndValue[i, 1]);
}
workbook.Write(wstr);
wstr.Close();
}
rstr.Close();
}
When you call sheet.CreateRow(0), the first row of the sheet will be wiped out and an empty row will be inserted with no style. The same goes with row.CreateCell().
So you are calling CreateRow over and over again, making only the last value of the row survive.
I think this might be the problem.

How to read the state of a checkbox in an excel file with EPPlus ( C# )

The title already explains my problem pretty well. I have an excel file which contains checkboxes and I would like to read their state (checked or not) using the EPPlus library.
I am not sure if this is even supported. So far I have found no documentation or examples for that specific problem using EPPlus.
If you add a Cell link then pulling the value is straight forward. I don't believe that the Drawing Object contains the value.
using System.Linq;
using OfficeOpenXml;
using OfficeOpenXml.Drawing;
namespace EPPlus {
public void Run() {
var excelFile = new System.IO.FileInfo(System.IO.Path.Combine(BaseDirectory, "Excel", "Checkbox.xlsx"));
using (ExcelPackage excel = new ExcelPackage(excelFile))
{
ExcelWorksheet sheet = excel.Workbook.Worksheets.SingleOrDefault(a => a.Name == "Sheet1");
ExcelDrawing checkbox2 = sheet.Drawings.SingleOrDefault(a => a.Name == "Check Box 2");
var value = sheet.Cells["G5"].Value.ToString();
}
}
}
}
For existing excel, just designate a cell somewhere and link it to the checkbox. Insert the true/false value directly to that cell (NOT TO THE CHECKBOX). The checkbox will automatically reflect the value of the cell in the checkbox.
You can put all designated cells in a certain column, then hide that column. :)

How do I insert Excel cells without creating a corrupt file?

I'm using the OpenXML SDK to update the contents of an Excel spreadsheet. When inserting cells into an Excel row they must be inserted in the correct order or the file will not open properly in Excel. I'm using the following code to find the first cell that will be after the cell I am inserting. This code comes almost directly from the OpenXML SDK documentation
public static Cell GetFirstFollowingCell(Row row, string newCellReference)
{
Cell refCell = null;
foreach (Cell cell in row.Elements<Cell>())
{
if (string.Compare(cell.CellReference.Value, newCellReference, true) > 0)
{
refCell = cell;
break;
}
}
return refCell;
}
When I edit files with this code and then open them in Excel, Excel reports that the file is corrupted. Excel is able to repair the file, but most of the data is removed from the workbook. Why does this result in file corruption?
Side note: I tried two different .NET Excel libraries before turning to the painfully low-level OpenXML SDK. NPOI created spreadsheets with corruption and EPPlus threw an exception whenever I tried to save. I was using the most recent version of each.
The code you are using is seriously flawed. This is very unfortunate, seeing as it comes from the documentation. It may work acceptably for spreadsheets that only use the first 26 columns but will fail miserably when confronted with "wider" spreadsheets. The first 26 columns are named alphabetically, A-Z. Columns 27-52 are named AA-AZ. Column 53-78 are named BA-BZ. (You should notice the pattern.)
Cell "AA1" should come after all cells with a single character column name (i.e. "A1" - "Z1"). Let's examine the current code comparing cell "AA1" with cell "B1".
string.Compare("B1", "AA1", true) returns the value 1
The code interprets this to mean that "AA1" should be placed before cell "B1".
The calling code will insert "AA1" before "B1" in the XML.
At this point the cells will be out of order and the Excel file is corrupted. Clearly, string.Compare by itself is not a sufficient test to determine the proper order of cells in a row. A more sophisticated comparison is required.
public static bool IsNewCellAfterCurrentCell(string currentCellReference, string newCellReference)
{
var columnNameRegex = new Regex("[A-Za-z]+");
var currentCellColumn = columnNameRegex.Match(currentCellReference).Value;
var newCellColumn = columnNameRegex.Match(newCellReference).Value;
var currentCellColumnLength = currentCellColumn.Length;
var newCellColumnLength = newCellColumn.Length;
if (currentCellColumnLength == newCellColumnLength)
{
var comparisonValue = string.Compare(currentCellColumn, newCellColumn, StringComparison.OrdinalIgnoreCase);
return comparisonValue > 0;
}
return currentCellColumnLength < newCellColumnLength;
}
If you wanted to place a new cell in column "BC" and you were comparing to cell "D5" you would use IsCellAfterColumn("D5", "BC5"). Substituting the new comparison function into the original code and simplifying with LINQ:
public static Cell GetFirstFollowingCell(Row row, string newCellReference)
{
var rowCells = row.Elements<Cell>();
return rowCells.FirstOrDefault(c => IsNewCellAfterCurrentCell(c.CellReference.Value, newCellReference));
}

Categories

Resources