Read data from combined Excel columns/rows using C# - c#

I'm trying to read data from an Excel document in C# using Microsofts COM Interop.
So far, I'm able to load the document and read some data from it. However, I need to read data from two different columns and output these as json (for a jquery ajax call)
I've made a quick prototype of how my Excel document is structured with the hope that it's a bit easier to explain ;-)
The method I have is called GetExcelDataByCategory(string categoryName) where the categoryName parameter would be used to find which column to get the data from.
So, i.e., if I'm making the call with "Category 2" as parameter, I need to get all the values in the C columns rows and the corresponding dates from the A column, so the output will look like this:
Which then needs to be transformed/parsed into JSON.
I've searched high and low on how to achieve this, but with no luck so far :-( I'm aware that I can use the get_Range() method to select a range, but it seems you need to explicitly tell the method which row and which column to get the data from. I.e.: get_Range("A1, C1")
This is my first experience with reading data from an Excel document, so I guess there's a lot to learn ;-) Is there a way to get the output on my second image?
Any help/hint is greatly appreciated! :-)
Thanks in advance.
All the best,
Bo

This is what I would do:
using Excel = Microsoft.Office.Interop.Excel;
Excel.Application xlApp = new Excel.Application();
Excel.Workbook xlWorkbook = xlApp.Workbooks.Open("path to book");
Excel.Worksheet xlSheet = xlWorkbook.Sheets[1]; // get first sheet
Excel.Range xlRange = xlSheet.UsedRange; // get the entire used range
int numberOfRows = xlRange.Rows.Count;
int numberOfCols = xlRange.Columns.Count;
List<int> columnsToRead = new List<int>();
// find the columns that correspond with the string columnName which
// would be passed into your method
for(int i=1; i<=numberOfCols; i++)
{
if(xlRange.Cells[1,i].Value2 != null) // ADDED IN EDIT
{
if(xlRange.Cells[1,i].Value2.ToString().Equals(categoryName))
{
columnsToRead.Add(i);
}
}
}
List<string> columnValue = new List<string>();
// loop over each column number and add results to the list
foreach(int c in columnsToRead)
{
// start at 2 because the first row is 1 and the header row
for(int r = 2; r <= numberOfRows; r++)
{
if(xlRange.Cells[r,c].Value2 != null) // ADDED IN EDIT
{
columnValue.Add(xlRange.Cells[r,c].Value2.ToString());
}
}
}
This is the code I would use to read the Excel. Right now it reads every column that has the heading (designated by whatever is in the first row) and then all the rows there. It isn't exactly what you asked (it doesn't format into JSON) but I think it is enough to get you over the hump.
EDIT: Looks like there are a few blank cells that are causing problems. A blank cell will be NULL in the Interop and thus we get errors if we try to call Value2 or Value2.ToString() since they don't exist. I added code to check to make sure that the cell isn't null before doing anything with it. It prevent the errors.

for Excel-parsing and creation you can use ExcelDataReader: http://exceldatareader.codeplex.com/
and for json you can use json.net: http://json.codeplex.com/
Both are fairly easy to use. Just have a look at the websites.

Related

How to iterate throgh a specific row in Excel table via Interop?

So, I'm writing a program that is reads table data and puts cells values in a List. I made it, but there is one problem – UsedRange takes all cells on sheet so there is more items then I need and also, when I specify range by ["A:A", Type.Missng] it gives me an exception:
System.ArgumentException: "HRESULT: 0x80070057 (E_INVALIDARG))"
So my question is how to make it correctly?
Code is:
foreach (Excel.Range row in usedRange)
{
for(int i=0; i<lastCell.Row; i++)
{
if (row.Cells[4, i + 1].Value2 != null)
{
personlist.Add(Convert.ToString(row.Cells[4, i + 1].Value2));
}
else { i++; }
}
foreach(var person in personlist) {
Console.WriteLine(person);
}
}
UPD: I need a last used row, that's why I'm using UsedRange. So if there is any alternatives, like, checking if(!=null)? I will gladly try it
Tried to give it specific range, some tries to made a code like here C# - How do I iterate all the rows in Excel._Worksheet?
and here
https://overcoder.net/q/236542/программно-получить-последнюю-заполненную-строку-excel-с-помощью-c
but maybe I'm a dumb one, 'cause there is literally more than one articles about it and non of it works with me
The problem is 'used range' can include empty range (who knows how excel decides that magic number - if you type a letter on some arbitrary row and then delete it Excel can decide that cell is still part of your used range). You want your own custom definition of what a 'usedRange' is, which presumably is the range of non-blank rows. There's two straightforward ways of implementing this yourself (which gives you added control over it should you want to customize it).
You can just filter the list after the fact removing all blank entries. Or you can process the list in reverse, skipping rows till you find one matching your criteria
bool startProcessing = false;
for(int i=lastCell.Row-1; i>=0; i--)
{
if(!startProcessing){//bool is in case you want blank rows in the middle of the file, otherwise check valid row always
//check if valid row
//continue; if not, set startProcessing to true if yes
}
if (row.Cells[4, i + 1].Value2 != null)
{
personlist.Add(Convert.ToString(row.Cells[4, i + 1].Value2));
}
//else { i++; } //this is a bug, will cause a line skip
}
Also, as an aside - when you call i++; in the body of your for loop, it then calls it again in the header of your for loop and i += 2 skipping a row. Use continue; or just remove the else block altogether.
There's probably a way to get a cellRange matching your criteria, but imo doing it yourself can be better - you can ensure it does exactly what you want.

Excel.Workbook as a Global variable C#

I am trying to read and write to several excel sheets and I have split the functions into different methods.
The problem I am encountering is I cannot carry over the Workbook/Worksheet names over into the second method.
First method: Open all the relevant excel documents i.e Parts list, Export list etc.
Second method: Copy data from Parts List to first sheet in Export list.
For example in the first method I may have
//Ws = Worksheet
//Wb = Workbook
//Workbooks and applications already defined
var PartsExportWs = PartsExportWb.Sheets[1].Name;
In the Second method I have:
public static void Parts
{
int PartsCounterX;
int TypicalCounterY = 4;
int NullCounter = 0;
var ConCatPartsCellValue = new System.Text.StringBuilder();
for (PartsCounterX = 1; NullCounter <= 3; ++PartsCounterX)
{
var PartsCellValue = PartsExportWs.Cells[TypicalCounterY, PartsCounterX].Value;
// etc ...
However, it errors out at PartsExportWs with the Description of "The name PartsExportWs does not exist in the current context"
I may be wrong but I am assuming that it is due to the fact it is not classed as a Global variable.
(If anyone has any suggestions it would be more than helpful. Even if it is on how to ask the question better!)
You cannot access a local variable from one method in the code from another method.
You either need to pass the variable as a parameter, or store it in a field.
(There is no such thing as a "global variable" in C#.)

ExcelDNA throwing exception accessing Range.Value2

I am porting an excel addin (used shimloader) to exceldna, and yeah, I have seen the other SO (and off SO) questions but nothing resolves my question, and I'm hoping there are newer solutions.
The code is simple.
[ExcelFunction(Name="DoSomething")]
string DoSomething()
{
var xl = ExcelDna.Application;
var callerCell = xl.Caller;
var row = getRow(excelReference.RowFirst+1, callerCell.WorkSheet) ;
}
In GetRow():
var row = (Range)worksheet.Rows[row];
var cell = (Range)bracketRow.Columns[4];
When I check debugger, I can see the retrieved cell is 100% correct because cell.FormulaLocal matches the excel row and column formula.
The value in FormulaLocal is "OtherSheet!A12".
But for some reason, whenever I try cell.Value2, it throws a COMException and nothing else. This is not a multithreaded application and I can't understand why this is happening.
Any ideas?
EDIT:
When I modify the formula to the value it should have gotten had the sheet reference been successful, it doesn't throw.
EDIT 2:
I got around this by adding IsMacroType=true attribute to the excel function. But now xl.Caller returns null, argh
Two issues needed solving:
range.Value2 threw a COMException if the cell has an invalid value e.g. #VALUE in excel.
range.Value2 threw a COMException if the cell referenced another worksheet in the same workbook e.g. "OtherSheet!A2"
To solve this, I set the IsMacroType attribute to true:
[ExcelFunction(Name="DoSomething",IsMacroType=true)]
string DoSomething()
{
var xl = ExcelDna.Application;
var callerCell = xl.Caller;
var row = getRow(excelReference.RowFirst+1, callerCell.WorkSheet) ;
}
The problem now though is, IsMacroType causes xl.Caller will now return null.
I got around this by:
ExcelReference reference = (ExcelReference)XlCall.Excel(XlCall.xlfCaller);
string sheetName = (string)XlCall.Excel(XlCall.xlSheetNm,reference);
int index = sheetName.IndexOf(']', 0) + 1;
int endIndex = sheetName.Length - index;
sheetName = sheetName.Substring(index, endIndex);
var worksheet = (Worksheet)xl.ActiveWorkbook.Sheets[sheetName];
This is my first rodeo to Excel world, is there any side effect to enabling IsMacroType? 'Cause I saw #Govert expressing some concerns of undefined behavior...

Transpose values in excel using C#

I saw this link - C# Transpose() method to transpose rows and columns in excel sheet in stackoverflow and this is what I am trying to do. But the guy is pretty unhelpful in the answers as he does not provide the full information needed. I am simply wanting to transpose cells A9:B15 in my excel sheet and then copy them either into a new xls file, a new worksheet, or better yet delete the current worksheet contents and replace it with the newly transposed paste contents. Clearly it can be done through the WorksheetFunction.Transpose method but I can't seem to get it to work as I don't know what rng or value2 are? I could create a datatable but surly using this method seems a more appropriate way of doing it. Here is the code from the stackoverflow question. .
Object[,] transposedRange = (Object[,])xlApp.WorksheetFunction.Transpose(rng.Value2);
xlApp.ActiveSheet.Range("A1").Resize(transposedRange.GetUpperBound(0), transposedRange.GetUpperBound(1)) = transposedRange;
Here is my code so far:
Application excel = new Application();
Workbook wb = excel.Workbooks.Open(#"P:\Visual Studio 2013\Projects\Debugging\Debugging\test.htm");
Microsoft.Office.Interop.Excel.Range rng = excel.get_Range("A9:B15");
Object[,] transposeRange = (Object[,])excel.WorksheetFunction.Transpose(rng);
transposeRange = excel.ActiveSheet.Range("A1").Resize(transposeRange.GetUpperBound(0), transposeRange.GetUpperBound(1));
wb.SaveAs(#"P:\Visual Studio 2013\Projects\Debugging\Debugging\testing.xls");
Not sure if I have done the rng right. I am so confused by this.
Is there some reason you need to do this in C#?
If what you want is just what you state, VBA code can accomplish this also. Just
read the transposed range into a variant array
clear the worksheet
write the variant array back to the worksheet at the cell of your choice.
Note that when you write the variant array back to the worksheet, the destination range must be the same size as the array.
Option Explicit
Sub TransposeRange()
Dim RGtoTranspose As Range
Dim V As Variant
Set RGtoTranspose = Range("A9:B15")
V = WorksheetFunction.Transpose(RGtoTranspose)
Cells.Clear
Range("a1").Resize(UBound(V, 1), UBound(V, 2)).Value = V
End Sub
Seems like nobody actually bothered to answer this and its still the top search engine hit for this issue (# July 2019, go figure...!), so here's my 2 cents...
I did not understand the hype about the WorksheetFunction.Transpose method. "Objectifying" things around isn't perhaps the cleanest way to go about this, particularly when using the Excel Interop anyway. At the end of the day, Transpose has been a dynamic parameter of the PasteSpecial() method since time immemorial. So why not use it as such? I think this was what prompted some people to suggest using VBA instead of C#... Anyway, this code works and does what the question requires methinks:
First get the references right...
using System;
using Microsoft.Office.Interop.Excel;
using Excel = Microsoft.Office.Interop.Excel;
Then try this...
string filePath = #"P:\Visual Studio 2013\Projects\Debugging\Debugging\test.htm";
string savePath = #"P:\Visual Studio 2013\Projects\Debugging\Debugging\testing.xls";
var excelApp = new Excel.Application()
{
Visible = true //This is optional
};
Workbooks workbook = excelApp.Workbooks;
workbook.Open(filePath);
Range range = excelApp.get_Range("A9:B15");
range.Copy();
excelApp.ActiveSheet.Range("A1").PasteSpecial(Transpose: true); //voila... :)
range.Delete(XlDeleteShiftDirection.xlShiftToLeft); //delete original range
if (!System.IO.File.Exists(savePath)) //is the workbook already saved?
{
excelApp.ActiveWorkbook.SaveAs(savePath); //save
}
else
{
Console.WriteLine("File \"{0}\" already exists.", savePath); //or do whatever...
Console.ReadLine();
return;
}
It could be simplified further... but it is more readable like this.
This was asked a long time ago but I will let my sollution anyway.
Refs:
using Microsoft.Office.Interop.Excel;
using Excel = Microsoft.Office.Interop.Excel;
The trick here is to get the _Application Variable
in case you are using VSTO ADDIN with Workbook you can do like this:
var app = Globals.ThisWorkbook.Parent as _Application;
For other kind of project do like this:
_Application app2 = new Excel.Application();
My sample (sheet1 is my worksheet):
var sheet1 = Globals.Planilha1;
var arr = new string[]
{
"test1",
"test2",
"test3",
"test4",
"test5",
"test6",
};
// For VSTO ADDINS with Workbook
//var app = Globals.ThisWorkbook.Parent as _Application;
// For any kind of Excel Interop project
_Application app = new Excel.Application();
sheet1.Range["A1:A" + arr.Length].Value = app.WorksheetFunction.Transpose(arr);
The Transpose function can only deal with arrays, lists won't work.
Just place the array inside the app.WorksheetFunction.Transpose function and it will work pretty well.
Output:

How to get format type of cell using c# in spreadsheetlight

I am using spreadsheetlight library to read Excel sheet(.xslx) values using c#.
I can read the cell value using following code
for (int col = stats.StartColumnIndex; col <= stats.EndColumnIndex; col++)
{
var value= sheet.GetCellValueAsString(stats.StartRowIndex, col); //where sheet is current sheet in excel file
}
I am getting the cell value. But how can I get the data type of the cell? I have checked in documentation but didn't find the solution.
Note: For .xls type of excel files i am using ExcelLibrary.dll library where i can easily get the datatype of cells using below code
for (int i = 0; i <= cells.LastColIndex; i++)
{
var type = cells[0, i].Format.FormatType;
}
but there is no similar method in spreadsheetlight.
Here is the answer from the developer Vincent Tang after I asked him as I wasn't sure how to use DataType:
Yes use SLCell.DataType. It's an enumeration, but for most data, you'll be working with Number, SharedString and String.
Text data will be SharedString, and possibly String if the text is directly embedded in the worksheet. There's a GetSharedStrings() or something like that.
For numeric data, it will be Number.
For dates, it's a little tricky. The data type is also Number (ignore the Date enumeration because Microsoft Excel isn't using it). For dates, you also have to check the FormatCode, which is in the SLStyle for the SLCell. Use the GetStyles() to get a list. The SLCell.StyleIndex gives you the index to that list.
For example, if your SLCell has cell value "15" and data type SharedString, then look for index 15 in the list of shared strings. If it's "blah" with String data type, then that's it.
If it's 56789 with Number type, then that's it.
Unless the FormatCode is "mm-yyyy" (or some other date format code), then 56789 is actually the number of days since 1 Jan 1900.
He also recommended using GetCellList() in order to obtain the list of SLCell objects in the sheet. However, for some reason that function was not available in my version of SL, so I used GetCells() instead. That returns a dictionary of SLCell objects, with keys of type SLCellPoint.
So for example to get the DataType (which is a CellValues object) of cell A1 do this:
using (SLDocument slDoc = new SLDocument("Worksheet1.xlsx", "Sheet1")) {
slCP = SLCellPoint;
slCP.ColumnIndex = SLConvert.ToColumnIndex("A"); ///Obviously 1 but useful function to know
slCP.RowIndex = 1;
CellValues slCV = slDoc.GetCells(slCP).DataType;
}
By the way, I also had a problem with opening the chm help file. Try this:
Right Click on the chm file and select properties
Click on Unblock button at the bottom of the General Tab
To get the value of the cell try following code
var cellValue = (string)(excelWorksheet.Cells[10, 2] as Excel.Range).Value;
Use this link for more details
Check out the SLCell.DataType Property. Spreadsheetlight documentation mentions that this returns the Cell datatype, in class Spreadsheetlight.SLCell
public CellValues DataType { get; set; }
PS: On a side note, I figured out how to open the chm documentation. Try opening the chm file in Winzip, it opens without any issues.
Hope it helps. Thanks
Well, After a lot of trail and error methods i got to find a solution for this.
Based on the formatCode of a cell we can decide the formatType of the cell.
Using GetCellStyle method we can get the formatcode of the cell. Using this formatCode we can decide the formatType.
var FieldType = GetDataType(sheet.GetCellStyle(rowIndex, columnIndex).FormatCode);
private string GetDataType(string formatCode)
{
if (formatCode.Contains("h:mm") || formatCode.Contains("mm:ss"))
{
return "Time";
}
else if (formatCode.Contains("[$-409]") || formatCode.Contains("[$-F800]") || formatCode.Contains("m/d"))
{
return "Date";
}
else if (formatCode.Contains("#,##0.0"))
{
return "Currency";
}
else if (formatCode.Last() == '%')
{
return "Percentage";
}
else if (formatCode.IndexOf("0") == 0)
{
return "Numeric";
}
else
{
return "String";
}
}
This method worked for 99% of the cases.
Hope it helps you.

Categories

Resources