How do I read in a single column from an Excel spreadsheet?

How do I read in a single column from an Excel spreadsheet? - c#

I'm trying to read a single column from an Excel document. I'd like to read the entire column, but obviously only store the cells that have data. I also would like to try and handle the case, where a cell(s) in the column are empty, but it will read in later cell values if there's something farther down in the column. For example:
| Column1 |
|---------|
|bob |
|tom |
|randy |
|travis |
|joe |
| |
|jennifer |
|sam |
|debby |
If I had that column, I don't mind having a value of "" for the row after joe, but I do want it to keep getting values after the blank cell. However, I do not want it to go on for 35,000 lines past debby assuming debby is the last value in the column.
It is also safe to assume that this will always be the first column.
So far, I have this:
Excel.Application myApplication = new Excel.Application();
myApplication.Visible = true;
Excel.Workbook myWorkbook = myApplication.Workbooks.Open("C:\\aFileISelect.xlsx");
Excel.Worksheet myWorksheet = myWorkbook.Sheets["aSheet"] as Excel.Worksheet;
Excel.Range myRange = myWorksheet.get_Range("A:A", Type.Missing);
foreach (Excel.Range r in myRange)
{
MessageBox.Show(r.Text);
}
I've found lots of examples from older versions of .NET that do similar things, but not exactly this, and wanted to make sure I did something that's more modern (assuming the method one would use to do this has changed some amount).
My current code reads the entire column, but includes blank cells after the last value.
EDIT1
I liked Isedlacek's answer below, but I do have a problem with it, that I'm not certain is specific to his code. If I use it in this way:
Excel.Application myApplication = new Excel.Application();
myApplication.Visible = true;
Excel.Workbook myWorkbook = myApplication.Workbooks.Open("C:\\aFileISelect.xlsx");
Excel.Worksheet myWorksheet = myWorkbook.Sheets["aSheet"] as Excel.Worksheet;
Excel.Range myRange = myWorksheet.get_Range("A:A", Type.Missing);
var nonEmptyRanges = myRange.Cast<Excel.Range>()
.Where(r => !string.IsNullOrEmpty(r.Text));
foreach (var r in nonEmptyRanges)
{
MessageBox.Show(r.Text);
}
MessageBox.Show("Finished!");
the Finished! MessageBox never shows. I'm not sure why that happens, but it appears to never actually finish searching. I tried adding a counter to the loop to see if it was just continuously searching through the column, but it doesn't appear to be ... it appears to just stop.
Where the Finished! MessageBox is, I tried to just close the workbook and spreadsheet, but that code never ran (as expected, since the MessageBox never ran).
If I close the Excel spreadsheet manually, I get a COMException:
COMException was unhandled by user code
Additional information: Exception from HRESULT: 0x803A09A2
Any ideas?

The answer depends on whether you want to get the bounding range of the used cells or if you want to get the non-null values from a column.
Here's how you can efficiently get the non-null values from a column. Note that reading in the entire tempRange.Value property at once is MUCH faster than reading cell-by-cell, but the tradeoff is that the resulting array can use up much memory.
private static IEnumerable<object> GetNonNullValuesInColumn(_Application application, _Worksheet worksheet, string columnName)
{
// get the intersection of the column and the used range on the sheet (this is a superset of the non-null cells)
var tempRange = application.Intersect(worksheet.UsedRange, (Range) worksheet.Columns[columnName]);
// if there is no intersection, there are no values in the column
if (tempRange == null)
yield break;
// get complete set of values from the temp range (potentially memory-intensive)
var value = tempRange.Value2;
// if value is NULL, it's a single cell with no value
if (value == null)
yield break;
// if value is not an array, the temp range was a single cell with a value
if (!(value is Array))
{
yield return value;
yield break;
}
// otherwise, the value is a 2-D array
var value2 = (object[,]) value;
var rowCount = value2.GetLength(0);
for (var row = 1; row <= rowCount; ++row)
{
var v = value2[row, 1];
if (v != null)
yield return v;
}
}
Here's an efficient way to get the minimum range that contains the non-empty cells in a column. Note that I am still reading the entire set of tempRange values at once, and then I use the resulting array (if multi-cell range) to determine which cells contain the first and last values. Then I construct the bounding range after having figured out which rows have data.
private static Range GetNonEmptyRangeInColumn(_Application application, _Worksheet worksheet, string columnName)
{
// get the intersection of the column and the used range on the sheet (this is a superset of the non-null cells)
var tempRange = application.Intersect(worksheet.UsedRange, (Range) worksheet.Columns[columnName]);
// if there is no intersection, there are no values in the column
if (tempRange == null)
return null;
// get complete set of values from the temp range (potentially memory-intensive)
var value = tempRange.Value2;
// if value is NULL, it's a single cell with no value
if (value == null)
return null;
// if value is not an array, the temp range was a single cell with a value
if (!(value is Array))
return tempRange;
// otherwise, the temp range is a 2D array which may have leading or trailing empty cells
var value2 = (object[,]) value;
// get the first and last rows that contain values
var rowCount = value2.GetLength(0);
int firstRowIndex;
for (firstRowIndex = 1; firstRowIndex <= rowCount; ++firstRowIndex)
{
if (value2[firstRowIndex, 1] != null)
break;
}
int lastRowIndex;
for (lastRowIndex = rowCount; lastRowIndex >= firstRowIndex; --lastRowIndex)
{
if (value2[lastRowIndex, 1] != null)
break;
}
// if there are no first and last used row, there is no used range in the column
if (firstRowIndex > lastRowIndex)
return null;
// return the range
return worksheet.Range[tempRange[firstRowIndex, 1], tempRange[lastRowIndex, 1]];
}

If you don't mind losing the empty rows completely:
var nonEmptyRanges = myRange.Cast<Excel.Range>()
.Where(r => !string.IsNullOrEmpty(r.Text))
foreach (var r in nonEmptyRanges)
{
// handle the r
MessageBox.Show(r.Text);
}

/// <summary>
/// Generic method which reads a column from the <paramref name="workSheetToReadFrom"/> sheet provided.<para />
/// The <paramref name="dumpVariable"/> is the variable upon which the column to be read is going to be dumped.<para />
/// The <paramref name="workSheetToReadFrom"/> is the sheet from which te column is going to be read.<para />
/// The <paramref name="initialCellRowIndex"/>, <paramref name="finalCellRowIndex"/> and <paramref name="columnIndex"/> specify the length of the list to be read and the concrete column of the file from which to perform the reading. <para />
/// Note that the type of data which is going to be read needs to be specified as a generic type argument.The method constraints the generic type arguments which can be passed to it to the types which implement the IConvertible interface provided by the framework (e.g. int, double, string, etc.).
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="dumpVariable"></param>
/// <param name="workSheetToReadFrom"></param>
/// <param name="initialCellRowIndex"></param>
/// <param name="finalCellRowIndex"></param>
/// <param name="columnIndex"></param>
static void ReadExcelColumn<T>(ref List<T> dumpVariable, Excel._Worksheet workSheetToReadFrom, int initialCellRowIndex, int finalCellRowIndex, int columnIndex) where T: IConvertible
{
dumpVariable = ((object[,])workSheetToReadFrom.Range[workSheetToReadFrom.Cells[initialCellRowIndex, columnIndex], workSheetToReadFrom.Cells[finalCellRowIndex, columnIndex]].Value2).Cast<object>().ToList().ConvertAll(e => (T)Convert.ChangeType(e, typeof(T)));
}

Related

How to iterate throgh a specific row in Excel table via Interop?

So, I'm writing a program that is reads table data and puts cells values in a List. I made it, but there is one problem – UsedRange takes all cells on sheet so there is more items then I need and also, when I specify range by ["A:A", Type.Missng] it gives me an exception:
System.ArgumentException: "HRESULT: 0x80070057 (E_INVALIDARG))"
So my question is how to make it correctly?
Code is:
foreach (Excel.Range row in usedRange)
{
for(int i=0; i<lastCell.Row; i++)
{
if (row.Cells[4, i + 1].Value2 != null)
{
personlist.Add(Convert.ToString(row.Cells[4, i + 1].Value2));
}
else { i++; }
}
foreach(var person in personlist) {
Console.WriteLine(person);
}
}
UPD: I need a last used row, that's why I'm using UsedRange. So if there is any alternatives, like, checking if(!=null)? I will gladly try it
Tried to give it specific range, some tries to made a code like here C# - How do I iterate all the rows in Excel._Worksheet?
and here
https://overcoder.net/q/236542/программно-получить-последнюю-заполненную-строку-excel-с-помощью-c
but maybe I'm a dumb one, 'cause there is literally more than one articles about it and non of it works with me

The problem is 'used range' can include empty range (who knows how excel decides that magic number - if you type a letter on some arbitrary row and then delete it Excel can decide that cell is still part of your used range). You want your own custom definition of what a 'usedRange' is, which presumably is the range of non-blank rows. There's two straightforward ways of implementing this yourself (which gives you added control over it should you want to customize it).
You can just filter the list after the fact removing all blank entries. Or you can process the list in reverse, skipping rows till you find one matching your criteria
bool startProcessing = false;
for(int i=lastCell.Row-1; i>=0; i--)
{
if(!startProcessing){//bool is in case you want blank rows in the middle of the file, otherwise check valid row always
//check if valid row
//continue; if not, set startProcessing to true if yes
}
if (row.Cells[4, i + 1].Value2 != null)
{
personlist.Add(Convert.ToString(row.Cells[4, i + 1].Value2));
}
//else { i++; } //this is a bug, will cause a line skip
}
Also, as an aside - when you call i++; in the body of your for loop, it then calls it again in the header of your for loop and i += 2 skipping a row. Use continue; or just remove the else block altogether.
There's probably a way to get a cellRange matching your criteria, but imo doing it yourself can be better - you can ensure it does exactly what you want.

C# Excel VSTO can not convert string to String.Array

I am new to VSTO C# excel add-in. I am looking to find total count of not null/empty rows in a range. My Code looks at the range "A4:E4" and count total number of rows.
This is the code :
private void button1_Click(object sender, RibbonControlEventArgs e)
{
Workbook workbook = Globals.ThisAddIn.GetWorkBook("c:\\temp\\testfile.xlsx");
Worksheet mergeSheet = workbook.Worksheets["Data"];
Excel.Range mergeCells = mergeSheet.Range["A4:E4"];
var colValues = (System.Array)mergeCells.Columns[1].Cells.Value;
var strArray = colValues.OfType<object>().Select(o => o.ToString()).ToArray();
var rowCount = strArray.Length;
}
[public Excel.Workbook GetWorkBook(string pathName)
{
return (Excel.Workbook)Application.Workbooks.Open(pathName);
}][1]
I get error var colValues = (System.Array)mergeCells.Columns[1].Cells.Value;on line :
Microsoft.CSharp.RuntimeBinder.RuntimeBinderException: 'Cannot convert type 'string' to 'System.Array''
It works when I have two rows in my range. I have hardcoded range A4:E4 to produce the error. My excel sheet (testfile.xlsx) looks like below:
Any ideas how do I resolve this?
Same line of code works when I have two rows. Eg and following line is updated
Excel.Range mergeCells = mergeSheet.Range["A4:E5"];

The problem is that Range.Value can return different types of objects. Among others, it can return
a single value of type String, if the range contains a single cell containing a string or
an array of values, if the range contains more than one cell.
The simplest solution would be to count the number of cells and "wrap" the special "single value" case in an array:
var range = mergeCells.Columns[1].Cells;
var values = (range.Count == 1)
? new object[] { range.Value })
: ((Sytem.Array)range.Value).Cast<object>();

This line is causing me trouble, var colValues = (System.Array)mergeCells.Columns[1].Cells.Value
This row has only one value. Note that same line works when mergeCells range has two rows.
Why it does not work for single cell range:
The Value of a single cell is not an array (it's a Long, String, DateTime, etc.) and won't be cast to an array in that manner. You can see this by testing like below:
var myArray = (System.Array)"hello"
This will give same failure for other types:
Why it works for multi-cell range:
The Value of a multi-cell range will return a variant array of the individual cell values, which either is, or can be cast to a System.Array
There may be a better resolution, but at least you should be able to do like:
var colValues;
if (mergeCells.Columns[1].Cells.Count > 1)
{
colValues = (System.Array)mergeCells.Columns[1].Cells.Value;
}
else
{
// NB: you may need to cast other data types to string
// or you could use .Cells[0].Text.Split()
colValues = (System.Array)mergeCells.Columns[1].Cells[0].Value.Split();
}

How to check if cell value is a number

Cells of Excel spreadsheet are read using Microsoft.Office.Interop.Excel object.
I need to find out formatting for number values and apply accordingly.
For instance I have 19.0000 value but when it is read Value2 will be "19" however I need to keep it "19.0000"
Excel.Range sheetRange = xlWorksheet.Range["A1", lastCell];
var cell = sheetRange.Cells[row, col];
cell.Value2 == "19";
There is a NumberFormat property that returns formatting string like "0.0000" that I could use, but I can't find out how to check if cell value is a number.

a bit of a hacky way around it is to add an apostrophe at the beginning - excel wont try to format it then
for instance
var value = 19.0000;
cell.value = "'" + value;

I use this function for that purpose:
public static bool IsValidDecimalNumber(this string s)
{
if (string.IsNullOrWhiteSpace(s)) return false; //blank/null strings aren't valid decimal numbers
return !s.Any(c => !(char.IsDigit(c) || c == '.')) && !(s.Count(c => c == '.') > 1);
}
Edit: to elaborate, it returns false if it's blank/null, then it returns false if any digits aren't a number or a decimal point, and it returns false if there's more than one decimal point. Otherwise, it returns true.

C# - Using OpenXML how to read blank spaces on excel files

I was searching for some solutions but failed to find one, currently I have a problem on reading an excel file using OpenXML. With perfect data, there won't be any problem, but with data with blanks, the columns seems to be moving to the left, producing an error saying that the index was not right since it actually moved to the left. I found a solution wherein you can place in cells in between, but when I tried it, an error saying that an object reference was not set to an instance of an object while reading the certain cell with this code (source is from the answer in here for inserting cells How do I have Open XML spreadsheet "uncollapse" cells in a spreadsheet?)
public static string GetCellValue(SpreadsheetDocument document, Cell cell)
{
SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
string value = cell.CellValue.InnerXml;
if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
{
return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
}
else if (cell == null)
{
return null;
}
else
{
return value;
}
}
any other ways wherein I can read blank cells as blank without moving the data to the left?
All help will be appreciated! :)
Thanks!

In Open XML, xml file does not contain an entry for the blank cell that's why blank cells are skipped. I faced the same problem. The only solution is apply some logic.
For Example:
When we read a cell we can get its ColumnName (A,B,C etc.) by the following code
string cellIndex = GetColumnName( objCurrentSrcCell.CellReference );
where
public static string GetColumnName(string cellReference)
{
// Create a regular expression to match the column name portion of the cell name.
Regex regex = new Regex("[A-Za-z]+");
Match match = regex.Match(cellReference);
return match.Value;
}
you can store these cells in a Hashtable where key can be the cell ColumnName and value can be the object of the cell. And when writing fetch cells from the hash object serially on some basis or your logic like...
you may loop from A to Z and read the cells at particular key like
if(objHashTable.Contains(yourKey))
{
Cell objCell = (Cell) objHashTable[yourKey];
//Insertcell or process cell
}
else
{
//do process for the empty cell like you may add a new blank cell
Cell objCell = new Cell();
//Insert cell or process cell
}
This is the only way to work with open xml. adding a blank cell during reading is a waste of time. You can add more logic according to you
try this. this will definitely work. or if you find a better solution, do tell me
Have a nice day :)

Writing string, numeric data to Excel via C# works, but Excel does not treat numeric data correctly

I'm getting result sets from Sybase that I return to a C# client.
I use the below function to write the result set data to Excel:
private static void WriteData(Excel.Worksheet worksheet, string cellRef, ref string[,] data)
{
Excel.Range range = worksheet.get_Range(cellRef, Missing.Value);
if (data.GetLength(0) != 0)
{
range = range.get_Resize(data.GetLength(0), data.GetLength(1));
range.set_Value(Missing.Value, data);
}
}
The data gets written correctly.
The issue is that since I'm using string array to write data (which is a mixture of strings and floats), Excel highlights every cell that contains numeric data with the message "Number Stored as Text".
How do I get rid of this issue?
Many thanks,
Chapax

Try the following: replace your array of string by an array of object.
var data = new object[2,2];
data[0, 0] = "A";
data[0, 1] = 1.2;
data[1, 0] = null;
data[1, 1] = "B";
var theRange = theSheet.get_Range("D4", "E5");
theRange.Value2 = data;
If I use this code, equivalent to yours:
var data = new string[2,2];
I get the same symptom as you.
As a side benefit, you don't have to cast anything to string: you can fill your array with whatever you want to see displayed.

Try setting the NumberFormat property on the range object.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How do I read in a single column from an Excel spreadsheet? - c#

If you don't mind losing the empty rows completely: var nonEmptyRanges = myRange.Cast<Excel.Range>() .Where(r => !string.IsNullOrEmpty(r.Text)) foreach (var r in nonEmptyRanges) { // handle the r MessageBox.Show(r.Text); }

Related

How to iterate throgh a specific row in Excel table via Interop?

C# Excel VSTO can not convert string to String.Array

How to check if cell value is a number

C# - Using OpenXML how to read blank spaces on excel files

Writing string, numeric data to Excel via C# works, but Excel does not treat numeric data correctly

Categories

Resources