How to use Epplus with cells containing few rows - c#

I want to import some excel file using epplus
the problem is that some cells contains more than one row (and that cause a problem
My excel look like this (in realite their is more tests (test2,test3....)
I can only get the first column by this algorithm..but it will be more complicated to get the seconde column
//this is the list than contain applications (column 2)
ICollection<Application> applications = new List<Application>();
int i = 0;
for (int j = workSheet.Dimension.Start.Row;
j <= workSheet.Dimension.End.Row;
j=i+1)
{
//this is the object that contain the first column
//and also a list of the second column (foreach domain thei `is a list of applications (column 2)`
Domaine domaine = new Domaine();
i += 1;
//add here and not last row
while (workSheet.Cells[i, 1].Text == "" && i < workSheet.Dimension.End.Row)
{
i++;
}
if (i > workSheet.Dimension.End.Row)
break;
domaine.NomDomaine = workSheet.Cells[i, 1].Text;
domaines.Add(domaine);
}
Edit : in other words is their a way to get the number of rows in one cell , OR a way to duplicate the value of each row in the cell
(for exemple if i have a cell from row 1 to 14 and the row number 5 have value)
how can i duplicate that text to all the rows (that will help me solving the problem)

Those are known as Merged cells. Values from merged cells are stored in the .Value property of the first cell in the merged range. This means we need to do just a little bit more work in order to read the value from a merged cell using EPPlus.
EPPlus provides us with a couple of properties that help us get to the correct reference though. Firstly we can use a cell's .Merge property to find out if it is part of a merged range. Then we can use the the worksheet's .MergedCells property to find the relevant range. It's then just a matter of finding the first cell in that range and returning the value.
So, in summary:
Determine if the cell we need to read from is part of a merged range using .Merge
If so, get the index of the merged range using the worksheet's .MergedCells property
Read the value from the first cell in the merged range
Putting this together we can derive a little helper method to take a worksheet object and row/col indices in order to return the value:
static string GetCellValueFromPossiblyMergedCell(ExcelWorksheet wks, int row, int col)
{
var cell = wks.Cells[row, col];
if (cell.Merge) //(1.)
{
var mergedId = wks.MergedCells[row, col]; //(2.)
return wks.Cells[mergedId].First().Value.ToString(); //(3.)
}
else
{
return cell.Value.ToString();
}
}
Worked example
If I have a domain class like this:
class ImportedRecord
{
public string ChildName { get; set; }
public string SubGroupName { get; set; }
public string GroupName { get; set; }
}
that I wanted to read from a spreadsheet that looked like this:
Then I could use this method:
static List<ImportedRecord> ImportRecords()
{
var ret = new List<ImportedRecord>();
var fInfo = new FileInfo(#"C:\temp\book1.xlsx");
using (var excel = new ExcelPackage(fInfo))
{
var wks = excel.Workbook.Worksheets["Sheet1"];
var lastRow = wks.Dimension.End.Row;
for (int i = 2; i <= lastRow; i++)
{
var importedRecord = new ImportedRecord
{
ChildName = wks.Cells[i, 4].Value.ToString(),
SubGroupName = GetCellValueFromPossiblyMergedCell(wks,i,3),
GroupName = GetCellValueFromPossiblyMergedCell(wks, i, 2)
};
ret.Add(importedRecord);
}
}
return ret;
}
static string GetCellValueFromPossiblyMergedCell(ExcelWorksheet wks, int row, int col)
{
var cell = wks.Cells[row, col];
if (cell.Merge)
{
var mergedId = wks.MergedCells[row, col];
return wks.Cells[mergedId].First().Value.ToString();
}
else
{
return cell.Value.ToString();
}
}

Related

How can I only get visible cells in an Excel spreadsheet using OpenXML?

I am pulling data from cells in an Excel spreadsheet using OpenXML in C#. I only want to pull data if the cell is visible on the spreadsheet. I can get all the cells with the code below:
var cells = part.Worksheet.Descendants<Cell>;
I can then use the "CellReference.Value" property to figure out what column the cell belongs to.
The code below will give me the visible columns on the spreadsheet.
var visible_columns = part.Worksheet.Descendants<Column>().Where(a => a.Hidden == null || a.Hidden.Value == false);
I am now stuck trying to programmatically associate the cell object with its column object. From what I can tell there is no property on the column object to get it's name. Ideally I would get the column name from the "CellReference.Value" property on the cell object using a regular expression. Once I had that I could use it to get the associated column object, which I could then use to check the Hidden property.
I also looked at the "Parent" property of the cell object, but this gives me the a Row object which doesn't solve my issue. Can anyone point me in the right direction?
Thanks
Here is how you can read cells that are not inside the hidden rows or columns:
static void Main()
{
using (var spreadsheetDocument = SpreadsheetDocument.Open("input.xlsx", false))
{
var workbookPart = spreadsheetDocument.WorkbookPart;
var worksheetPart = workbookPart.WorksheetParts.First();
var worksheet = worksheetPart.Worksheet;
var columns = worksheet.Elements<Columns>().First();
// Get names of the hidden columns.
var hiddenColumnNames = new HashSet<string>();
foreach (var column in columns.Elements<Column>().Where(c=> c.Hidden != null && c.Hidden.Value))
for (uint min = column.Min, max = column.Max; min <= max; min++)
hiddenColumnNames.Add(GetColumnName(min));
var sheetData = worksheet.Elements<SheetData>().First();
foreach (var row in sheetData.Elements<Row>())
{
// Skip cells that are in hidden row.
if (row.Hidden != null && row.Hidden.Value)
continue;
foreach (var cell in row.Elements<Cell>())
{
// Skip cell that is in hidden column.
var columnName = cell.CellReference.Value.Replace(row.RowIndex.ToString(), "");
if (hiddenColumnNames.Contains(columnName))
continue;
// TODO: read visible cell ...
}
}
}
}
static string GetColumnName(uint columnNumber)
{
string columnName = "";
while (columnNumber > 0)
{
uint modulo = (columnNumber - 1) % 26;
columnName = Convert.ToChar(65 + modulo).ToString() + columnName;
columnNumber = (uint)((columnNumber - modulo) / 26);
}
return columnName;
}

Get Merged Cell Area with EPPLus

I'm using EPPlus to read excel files.
I have a single cell that is part of merged cells. How do I get the merged range that this cell is part of?
For example:
Assume Range ("A1:C1") has been merged.
Given Range "B1" it's Merge property will be true but there isn't a way to get the merged range given a single cell.
How do you get the merged range?
I was hoping for a .MergedRange which would return Range("A1:C1")
There is no such property out of the box but the worksheet has a MergedCells property with an array of all the merged cell addresses in the worksheet and a GetMergeCellId() method which will give you the index for a given cell address.
We can therefore combine these into a little extension method you can use to get the address. Something like this:
public static string GetMergedRangeAddress(this ExcelRange #this)
{
if (#this.Merge)
{
var idx = #this.Worksheet.GetMergeCellId(#this.Start.Row, #this.Start.Column);
return #this.Worksheet.MergedCells[idx-1]; //the array is 0-indexed but the mergeId is 1-indexed...
}
else
{
return #this.Address;
}
}
which you can use as follows:
using (var excel = new ExcelPackage(new FileInfo("inputFile.xlsx")))
{
var ws = excel.Workbook.Worksheets["sheet1"];
var b3address = ws.Cells["B3"].GetMergedRangeAddress();
}
(Note that in the event that you use this method on a multi-celled range it will return the merged cell address for the first cell in the range only)
You can get all merged cells from worksheet, hence
you can find the merged range a specific cell belongs to using the following:
public string GetMergedRange(ExcelWorksheet worksheet, string cellAddress)
{
ExcelWorksheet.MergeCellsCollection mergedCells = worksheet.MergedCells;
foreach (var merged in mergedCells)
{
ExcelRange range = worksheet.Cells[merged];
ExcelCellAddress cell = new ExcelCellAddress(cellAddress);
if (range.Start.Row<=cell.Row && range.Start.Column <= cell.Column)
{
if (range.End.Row >= cell.Row && range.End.Column >= cell.Column)
{
return merged.ToString();
}
}
}
return "";
}
Update:
Turns out that there is a much easier way using EPPLUS, just do the following:
var mergedadress = worksheet.MergedCells[row, column];
For example, if B1 is in a merged range "A1:C1":
var mergedadress = worksheet.MergedCells[1, 2]; //value of mergedadress will be "A1:C1".
2 is the column number because B is the 2nd column.
This will provide you exact width of merged cells:
workSheet.Cells[workSheet.MergedCells[row, col]].Columns
Not a direct answer as Stewart's answer is perfect, but I was lead here looking for a way to get the value of a cell, whether it's part of a larger merged cell or not, so I improved on Stewart's code:
public static string GetVal(this ExcelRange #this)
{
if (#this.Merge)
{
var idx = #this.Worksheet.GetMergeCellId(#this.Start.Row, #this.Start.Column);
string mergedCellAddress = #this.Worksheet.MergedCells[idx - 1];
string firstCellAddress = #this.Worksheet.Cells[mergedCellAddress].Start.Address;
return #this.Worksheet.Cells[firstCellAddress].Value?.ToString()?.Trim() ?? "";
}
else
{
return #this.Value?.ToString()?.Trim() ?? "";
}
}
And call it like this
var worksheet = package.Workbook.Worksheets[i];
var rowCount = worksheet.Dimension.Rows;
var columnCount = worksheet.Dimension.Columns;
for (int row = 1; row <= rowCount; row++)
{
for (int col = 1; col <= columnCount; col++)
{
string val = worksheet.Cells[row, col].GetVal();
}
}

Update all objects where create new one

I have a Languages class in which i use the excellibrary. I have an .xls file in which i have three columns. The first is used to check if the key phrase is used in the document and then i have one column for each language i use. I would like to create a transactionField object for every row of the document. I try to do that but every time i create a new object all the objects that was created before take the values of the last object created. Camn you please explain me where i am wrong and how can i correct that issue?
This is where the mistake happen
TranslationField tnf = new TranslationField();
tnf.Used = false;
tnf.Strings = values;
Translations.Add(sKey, tnf);
public class Languages
{
public static bool Setup()
{
SupportedLanguages.Clear();
SupportedLanguages.Add(csDefaultLang);
try
{
Workbook book = Workbook.Load(sPath);
Worksheet sheet = book.Worksheets[0];
KeyStringHelper values = new KeyStringHelper();
TranslationNeedle tnl;
List<string> columns = new List<string>();
string sKey = "";
// traverse rows by Index
for (int rowIndex = sheet.Cells.FirstRowIndex; rowIndex <= sheet.Cells.LastRowIndex; rowIndex++)
{
Row row = sheet.Cells.GetRow(rowIndex);
row.FirstColIndex = 1;
for (int colIndex = row.FirstColIndex; colIndex <= row.LastColIndex; colIndex++)
{
Cell cell = row.GetCell(colIndex);
// the first excel row is assumed to be columns names
if (rowIndex == sheet.Cells.FirstRowIndex)
{
//Columns names correctly formatted
columns.Add(char.ToUpper(cell.StringValue[0]) + cell.StringValue.Substring(1).ToUpper());
//Register every language inside the xls
SupportedLanguages.Add(char.ToUpper(cell.StringValue[0]) + cell.StringValue.Substring(1).ToUpper());
}
else
{
if (colIndex - row.FirstColIndex == 0)
sKey = cell.StringValue.Replace("\r\n", "\n");
else
values.Add(columns[colIndex - row.FirstColIndex], cell.StringValue.Replace("\r\n", "\n"));
}
}
// add the cell values to Translations Dictionary
if (rowIndex != sheet.Cells.FirstRowIndex)
{
TranslationField tnf = new TranslationField();
tnf.Used = false;
tnf.Strings = values;
Translations.Add(sKey, tnf);
}
}
//other stuff
}
}
Here is the class TranslationField
class TranslationField
{
public bool Used = false;
public KeyStringHelper Strings = new KeyStringHelper();
}
You're reusing the same KeyStringHelper instance (values) for every TranslationField. So every TranslationField instance in your Translations collection is referencing the same KeyStringHelper instance.
It looks like you need to move the line
KeyStringHelper values = new KeyStringHelper();
inside the outer for loop.

DataTable normalize blanks cells

I have not found a method to normalize a DataTable that came from an Excel with merged cells. When I get the DataTable from that Excel, only the first cell has the value, others are blank.
An example of this DataTable is:
and the expected result:
To summarize: blanks cells should be completed with the value of the next cell above with a value, since is what was happened with the Excel merge of cells.
I'm using Excel.dll to read this Excel, didn't provide the autofill of cells, so that's why I'm searching for a method inside C#.
I suppose that logic should be: if a cell is blank, use the upper cell as a value. The logic appears clear but I have issues trying to get the code to apply it.
This is a sample, but at the end, I'm looking for a method to do this whenever columns or rows have the datatable.
Edit:
Thanks for your quicky feedback.
Attached what i have so far for just only one column and with errors since doesn't take care of the first and last row, but is the idea... what i try to achieve is to have a method for any amount of cols and rows (could be ok if cols are fixed with names, and then if i have more columns i will adapt).
private void NormalizeDataTable(DataTable dtRawTable)
{
DataTable dtFinalized = new DataTable();
dtFinalized.Columns.Add("Col1", typeof(String));
string previousValue = "";
for (int index = 0; index <= dtRawTable.Rows.Count; index++)
{
DataRow dr = dtFinalized.NewRow();
if (index != 0 || index == dtRawTable.Rows.Count -1)
{
if (dtRawTable.Rows[index]["Modelo"].ToString() == "")
{
dr["Col1"] = previousValue;
}
else
{
dr["Col1"] = Convert.ToString(dtRawTable.Rows[index]["Modelo"].ToString());
previousValue = (string)dr["Col1"];
}
}
dtFinalized.Rows.Add(dr);
dtFinalized.AcceptChanges();
}
}
Here is the function i using in my project for same requirement.
public static DataTable AutoFillBlankCellOfTable(DataTable outputTable)
{
for (int i = 0; i < outputTable.Rows.Count; i++)
{
for (int j = 0; j < outputTable.Columns.Count; j++)
{
if (outputTable.Rows[i][j] == DBNull.Value)
{
if (i > 0)
outputTable.Rows[i][j] = outputTable.Rows[i - 1][j];
}
}
}
return outputTable;
}

how to get the cell column of a range

I'm iterating over a row which I got from a range in order to find a specific word in the cell content and then I want to get the column where I find it. For example, if I find the desired content at the 19th place, it means the the excel column is "S".
Here is the code I'm using so far:
Excel.Worksheet xlWorkSheet = GetWorkSheet(currentWorkBook, "sheet");
var row = xlWorkSheet.Rows["5"];
int rowLength = xlWorkSheet.UsedRange.Columns.Count;
Excel.Range currentTitle = row[1]; //in order to iterate only over the 5th row in this example
for (int i = 1; i < rowLength; i++)
{
string title = currentTitle.Value2[1,i];
if (title == null)
{
continue;
}
if (title.Contains(wordToSearch))
{
string column = THIS IS THE QUESTION - WHAT DO I NEED TO WRITE HERE?
Excel.Range valueCell = xlWorkSheet.Range[column + "5"];
return valueCell.Value2;
}
notice the line of string column in which i need to add the code.
As far as you don't want to rely on any calculation, the other option I see is extracting the column letter from Address property, like in the code below:
Excel.Range currentRange = (Excel.Range)currentTitle.Cells[1, i];
string columnLetter = currentRange.get_AddressLocal(true, false, Excel.XlReferenceStyle.xlA1, missing, missing).Split('$')[0];
string title = null;
if (currentRange.Value2 != null)
{
title = currentRange.Value2.ToString();
}
As you can see, I am forcing a "$" to appear in order to ease the column letter retrieval.

Categories

Resources