Linking of Specific excel columns via Open Xml - c#

using Microsoft.Office.Interop.Excel.dll i am able to get specific row and specific columns data from excel sheet in to a list by using below code
Excel.Workbook MyWorkBook = Excel_App.Workbooks.Open(path, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing);
Excel.Worksheet MyWorksheet = null;
MyWorksheet = (Microsoft.Office.Interop.Excel.Worksheet)MyWorkBook.Sheets[(1)];
Excel.Range Excel_Range;
Excel_Range = MyWorksheet.UsedRange;
SheetCount = MyWorkBook.Sheets.Count;
Lastrow = MyWorksheet.Cells.Find("*", System.Reflection.Missing.Value, System.Reflection.Missing.Value, System.Reflection.Missing.Value, Excel.XlSearchOrder.xlByRows, Excel.XlSearchDirection.xlPrevious, false, System.Reflection.Missing.Value, System.Reflection.Missing.Value).Row;
LastColumn = MyWorksheet.Cells.Find("*", System.Reflection.Missing.Value, System.Reflection.Missing.Value, System.Reflection.Missing.Value, Excel.XlSearchOrder.xlByColumns, Excel.XlSearchDirection.xlPrevious, false, System.Reflection.Missing.Value, System.Reflection.Missing.Value).Column;
for (int i = 8; i <= Lastrow; i++)
List_MAPPING_FILE_A429_PATHS.Add((string)(Excel_Range.Cells[i, 4] as Excel.Range).Value2.ToString());
List_MAPPING_FILE_ASCB_PATHS.Add((string)(Excel_Range.Cells[i, 5] as Excel.Range).Value2.ToString());
now i want to get same data stored in list by using OpenXml.dll i tried below code but got stuck how to proceed further
public void AddtoLogFile( )
string temp =#"C:\Ported\DATA\EJETE2_A429RX_TIF_temp.xml";
using (SpreadsheetDocument myDoc = SpreadsheetDocument.Open(temp, true))
WorkbookPart workbookPart = myDoc.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData =
foreach (Row r in sheetData.Elements<Row>())
foreach (Cell c in r.Elements<Cell>())
string text = c.CellValue.Text;
can someone help me in this.

The Cells location is stored in Cell.CellReference,
The cells reference for the first cell would be "A1"
Use this method to extract the Column:
private static readonly Regex ColumnNameRegex = new Regex("[A-Za-z]+");
private static string GetColumnName(string cellReference)
if (ColumnNameRegex.IsMatch(cellReference))
return ColumnNameRegex.Match(cellReference).Value;
throw new ArgumentOutOfRangeException(cellReference);
Not sure what your trying to get from the Spreadsheet, I guess you only want the information from cells in the certain row:
foreach (Row r in sheetData.Elements<Row>())
foreach (Cell c in r.Elements<Cell>())
if (GetColumnName(c.CellReference) == "A")
string text = c.CellValue.InnerText;
The data given from CellVaule.InnerText will be a reference to the SharedStringTabl which holds all the strings for a worksheet. You will need to get the data from the SharedStringTable which is stored as
For this I use a method that takes the Cell and SharedStringTable to return the value:
public static string GetCellV (Cell cell, SharedStringTable ss)
string cellV = null;
cellV = cell.CellValue.InnerText;
if (cell.DataType != null
&& cell.DataType.Value == CellValues.SharedString)
cellV = ss.ElementAt(Int32.Parse(cellV)).InnerText;
cellV = cell.CellValue.InnerText;
catch (Exception)
cellV = " ";
return cellV;


Loop through Excel files and copy correct range in a separate file with C#

Today I have decided to make an Excel automatization task with C#. This is probably the first time I am doing something like this, thus the problems are plenty.
The task:
Pretty much, the idea is the following - I have 4 excel files in folder strPath. I have to loop through all of them and make a file called Report.xlsx in the same folder, with the information from those files.
The information, that I need is anything, below row 9. Thus, the first row to copy is row number 10. That is why, the first file I loop for is saved as Report, and the bMakeOnce value is changed. After the first file is looped and saved As, I start entering into the else condition. There I locate the last used row of the XL files and I try to copy the range into the sheetReport.
The questions:
First of all - any ideas for code improvement;
Whenever I am looping through the files I get the following picture telling me that each of the looping file is opened already.
Any good idea how to do the range copy better? Currently, I simply try to put the copied range on every 200+n line, to avoid some confusion for me.
Any idea why I do not get anything in the sheetReport, except for the first file?
The code I am using (initially, for the current goto Github below):
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using System.Reflection;
using Excel = Microsoft.Office.Interop.Excel;
using Word = Microsoft.Office.Interop.Word;
class MainClass
static void Main()
string strPath = Path.GetFullPath(Path.Combine(Directory.GetCurrentDirectory(), #"..\..\..\"));
string[] strFiles = Directory.GetFiles(strPath);
Excel.Application excel = null;
bool bMakeOnce = true;
int intFirstLine = 10;
int intLastColumn = 50;
int lastRow;
int lastRowReport;
Excel.Workbook wkbReport = null;
string strWkbReportPath;
int n = 0;
foreach (string strFile in strFiles)
Excel.Workbook wkb = null;
Excel.Worksheet sheet = null;
Excel.Worksheet sheetReport = null;
Excel.Range rngLast = null;
Excel.Range rngLastReport = null;
Excel.Range rngToCopy = null;
Excel.Range rngDestination = null;
excel = new Excel.Application();
excel.Visible = true;
wkb = OpenBook(excel, strFile);
if (bMakeOnce)
bMakeOnce = false;
strWkbReportPath = wkb.Path + "\\" + "Report.xlsx";
wkbReport = OpenBook(excel, strWkbReportPath);
wkb = OpenBook(excel, strFile);
sheetReport = wkbReport.Worksheets[1];
sheet = wkb.Worksheets[1];
rngLastReport = sheetReport.Cells.SpecialCells(Excel.XlCellType.xlCellTypeLastCell, Type.Missing);
rngLast = sheet.Cells.SpecialCells(Excel.XlCellType.xlCellTypeLastCell, Type.Missing);
rngToCopy = sheet.Range[sheet.Cells[intFirstLine, 1], sheet.Cells[rngLast.Row, intLastColumn]];
int size = rngToCopy.Rows.Count;
rngDestination = sheetReport.Range[sheetReport.Cells[200 * n, 1], sheetReport.Cells[200 * n + size, intLastColumn]];
public static Excel.Workbook OpenBook(Excel.Application excelInstance, string fileName, bool readOnly = false, bool editable = true, bool updateLinks = true)
Excel.Workbook book = excelInstance.Workbooks.Open(
fileName, updateLinks, readOnly,
Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing, editable, Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing);
return book;
Now it works somehow, producing what I want:
using System;
using System.IO;
using Excel = Microsoft.Office.Interop.Excel;
class MainClass
static void Main()
string strPath = Path.GetFullPath(Path.Combine(Directory.GetCurrentDirectory(), #"..\..\..\"));
string[] strFiles = Directory.GetFiles(strPath);
Excel.Application excel = null;
bool bMakeOnce = true;
string strReportName = "Report.xlsx";
int intFirstLine = 10;
int intLastColumn = 50;
int lastRow;
int lastRowReport;
int intTotalRows;
Excel.Workbook wkbReport = null;
string strWkbReportPath;
int n = 0;
excel = new Excel.Application();
excel.Visible = true;
foreach (string strFile in strFiles)
if (strFile.Contains(strReportName))
Console.WriteLine(strReportName + " is deleted.");
foreach (string strFile in strFiles)
if (strFile.Contains(strReportName))
Excel.Workbook wkb = null;
Excel.Worksheet sheet = null;
Excel.Worksheet sheetReport = null;
Excel.Range rngLastReport = null;
Excel.Range rngToCopy = null;
wkb = Open(excel, strFile);
if (bMakeOnce)
bMakeOnce = false;
strWkbReportPath = wkb.Path + "\\" + strReportName;
wkbReport = Open(excel, strWkbReportPath);
sheetReport = wkbReport.Worksheets[1];
sheet = wkb.Worksheets[1];
//lastRow = sheet.Cells[1, 3].get_End(Excel.XlDirection.xlUp).Row;
intTotalRows = sheet.Rows.Count;
lastRow = sheet.Cells[intTotalRows, 1].End(Excel.XlDirection.xlUp).Row;
lastRowReport = sheetReport.Cells[intTotalRows, 1].End(Excel.XlDirection.xlUp).Row;
//lastRowReport = sheetReport.Cells[intTotalRows, 1].get_End(Excel.XlDirection.xlUp).Row;
//lastRowReport = sheetReport.Cells[intTotalRows, intTotalRows.End[Excel.XlDirection.xlUp]].Row;
rngToCopy = sheet.Range[sheet.Cells[intFirstLine,1],sheet.Cells[lastRow, intLastColumn]];
int size = rngToCopy.Rows.Count;
rngLastReport = sheetReport.Range[sheetReport.Cells[lastRowReport+1, 1], sheetReport.Cells[lastRowReport + 1+size, intLastColumn]];
public static Excel.Workbook Open(Excel.Application excelInstance, string fileName, bool readOnly = false, bool editable = true, bool updateLinks = true)
Excel.Workbook book = excelInstance.Workbooks.Open(
fileName, updateLinks, readOnly,
Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing, editable, Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing);
return book;
//public static Excel.Workbook OpenBook(Excel.Application excelInstance, string fileName, bool readOnly = false, bool editable = true, bool updateLinks = true)
// Excel.Workbook book = excelInstance.Workbooks.Open(
// fileName, updateLinks, readOnly,
// Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
// Type.Missing, editable, Type.Missing, Type.Missing, Type.Missing,
// Type.Missing, Type.Missing);
// return book;
Thus, I have put it codeReview here:

How to read the entire worksheet from excel

I am reading data from numerous excel worksheets, but the performance is slow since I am fetching each col directly. Is there a way I can read the entire UsedRange into memory with one call? Then process the rows/cols locally?
The code I have is basically this:
xlWorkSheet = (Worksheet)_xlWorkBook.Worksheets.get_Item(1);
var range = xlWorkSheet.UsedRange;
for (var rCnt = 2; rCnt <= range.Rows.Count; rCnt++)
// Process column entries
I had the same problem while handling very large excel
I managed to read it as range and then transformed it to List> using AsParallel() on each row
It made it to run much faster
Here is the code:
private List<List<string>> ReadExcelFile(string fileName)
Excel.Application xlApp = null;
Workbook xlWorkbook = null;
Sheets xlSheets = null;
Worksheet xlSheet = null;
var results = new List<List<string>>();
xlApp = new Microsoft.Office.Interop.Excel.Application();
xlWorkbook = xlApp.Workbooks.Open(fileName, Type.Missing, true, Type.Missing, Type.Missing, Type.Missing, true, XlPlatform.xlWindows, Type.Missing,false, false, Type.Missing, false, Type.Missing, Type.Missing);
xlSheets = xlWorkbook.Sheets as Sheets;
xlSheet = xlSheets[1];
// Let's say your range is from A1 to DG5200
var cells = xlSheet.get_Range("A1", "DG5200");
results = ExcelRangeToListsParallel(cells);
catch (Exception)
results = null;
if (xlSheet != null)
if (xlSheets != null)
if (xlWorkbook != null)
if (xlApp != null)
xlApp = null;
return results;
private List<List<String>> ExcelRangeToListsParallel(Excel.Range cells)
return cells.Rows.Cast<Excel.Range>().AsParallel().Select(row =>
return row.Cells.Cast<Excel.Range>().Select(cell =>
var cellContent = cell.Value2;
return (cellContent == null) ? String.Empty : cellContent.ToString();

How to persist Excel cell formats in C# Interop?

I am reading an Excel sheet programmatically using Microsoft.Office.Interop.Excel in C#.
I am able to read it row by row and converting each row to a string arrray. Then, I am adding these rows to a DataTable.
Every thing works fine except the one of the column in the Excel contains Date values, and when I fetch it from the Excel Range object and cast it to string array, the date values gets converted to some sort of decimal numbers.
For e.g.-
If the date value is '6/4/2016 8:14:39 PM', I get the value as '42522.5224305556'
If the date value is '5/27/2016 1:10:12 PM', I get the value as '42517.54875'
Below is my code-
private System.Data.DataTable GetTicketsFromExcel(string excelFilePath)
System.Data.DataTable dtblTickets = new System.Data.DataTable();
Microsoft.Office.Interop.Excel.Application excelApp = new Microsoft.Office.Interop.Excel.Application();
Worksheet ws = new Worksheet();
Workbook wb = null;
wb = excelApp.Workbooks.Open(excelFilePath, Type.Missing, Type.Missing,
Type.Missing, Type.Missing,
Type.Missing, Type.Missing,
Type.Missing, Type.Missing,
Type.Missing, Type.Missing,
Type.Missing, Type.Missing,
Type.Missing, Type.Missing);
ws = (Microsoft.Office.Interop.Excel.Worksheet)wb.Sheets.get_Item(1);
Range usedRange = ws.UsedRange;
Range rowRange;
string[] lsRow = null;
for (int i = 1; i <= usedRange.Columns.Count; i++)
dtblTickets.Columns.Add(usedRange.Cells[5, i].Value.ToString());
string sortColumn = "Reported On";
string sortDirection = "DESC";
dtblTickets.Columns[sortColumn].DataType = typeof(DateTime);
for (int row = 6; row <= usedRange.Rows.Count; row++)
rowRange = usedRange.Rows[row];
object[,] cellValues = (object[,])rowRange.Value2;
lsRow = cellValues.Cast<object>().Select(o => Convert.ToString(o)).ToArray<string>();
dtblTickets.DefaultView.Sort = sortColumn + " " + sortDirection;
dtblTickets = dtblTickets.DefaultView.ToTable();
catch (Exception ex)
ws = null;
wb = null;
excelApp = null;
return dtblTickets;
Please note-
I don't want to use OLEDB to read and export this
I want to able to read the Excel row by row (without extracting each cell value and converting them)
I don't want to convert/format the original Excel document data
Can someone please help me with this?
Not quite sure, if you want to solve the problem this way, but one way is to change the property of the Cells (or the whole row or column) in Excel.
Right click on a Cell
Format Cells
Under "Number" select Category "Text" for the Cells.
I've tested it and it worked.

Reading data from excel 2010 using Microsoft.Office.Interop.Excel

I am not able to read data in Excel. Here is the code I am using:
using Excel = Microsoft.Office.Interop.Excel;
Excel.Application xlApp = new Excel.Application();
Excel.Workbook xlWorkbook = xlApp.Workbooks.Open(#"Book1.xlsx", 0, true, 5, "", "", true, Excel.XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);
Excel._Worksheet xlWorksheet = (Excel._Worksheet)xlWorkbook.Sheets[1];
Excel.Range xlRange = xlWorksheet.UsedRange;
int rowCount = xlRange.Rows.Count;
int colCount = xlRange.Columns.Count;
for (int i = 1; i <= rowCount; i++)
for (int j = 1; j <= colCount; j++)
I get a message box that says something about System.__ComObject instead of a value.
How can I fix this?
I found the solution for above, here is the code:
string temp = (string)(xlRange.Cells[i, j] as Excel.Range).Value2;
HavenĀ“t tested it, but I think it should read
or alternatively
Try this:
use the following function to get data as DATATABLE object for N'th sheet :
public DataTable GetWorkSheet(int workSheetID)
string pathOfExcelFile = fileFullName;
DataTable dt = new DataTable();
excel.Application excelApp = new excel.Application();
excelApp.DisplayAlerts = false; //Don't want Excel to display error messageboxes
excel.Workbook workbook = excelApp.Workbooks.Open(pathOfExcelFile, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing); //This opens the file
excel.Worksheet sheet = (excel.Worksheet)workbook.Sheets.get_Item(workSheetID); //Get the first sheet in the file
int lastRow = sheet.Cells.SpecialCells(excel.XlCellType.xlCellTypeLastCell, Type.Missing).Row;
int lastColumn = sheet.Cells.SpecialCells(excel.XlCellType.xlCellTypeLastCell, Type.Missing).Column;
excel.Range oRange = sheet.get_Range(sheet.Cells[1, 1], sheet.Cells[lastRow, lastColumn]);//("A1",lastColumnIndex + lastRow.ToString());
for (int i = 0; i < oRange.Columns.Count; i++)
dt.Columns.Add("a" + i.ToString());
object[,] cellValues = (object[,])oRange.Value2;
object[] values = new object[lastColumn];
for (int i = 1; i <= lastRow; i++)
for (int j = 0; j < dt.Columns.Count; j++)
values[j] = cellValues[i, j + 1];
workbook.Close(false, Type.Missing, Type.Missing);
catch (Exception ex)
System.Windows.Forms.MessageBox.Show(ex.Message, "Error", System.Windows.Forms.MessageBoxButtons.OK, System.Windows.Forms.MessageBoxIcon.Error);
return dt;
Try this code:
This code works sucessfully for me.

Generic Parser Design

I have this function implemented for parsing employee details, similarly i will have to parse for sales, customer etc for that i need to create 2 more functions. The code will be repeated in all the functions only difference being
the return type of the function
instantiating appropriate object
cells to read
is there any way to move the repeating code to a class and configure it so that i an reuse it?
public List<Employee> ParseEmployee(string filePath)
Application _excelApp = null;
Workbooks workBooks = null;
Workbook workBook = null;
Sheets wSheets = null;
Worksheet wSheet = null;
Range xlRange = null;
Range xlRowRange = null;
Range xlcolRange = null;
List<Employee> empLst= new List<Employee>();
_excelApp = new Application();
workBooks = _excelApp.Workbooks;
workBook = workBooks.Open(filePath, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing);
wSheets = (Sheets)workBook.Sheets;
wSheet = (Worksheet)wSheets.get_Item(1);
xlRange = wSheet.UsedRange;
xlRowRange = xlRange.Rows;
xlcolRange = xlRange.Columns;
int rowCount = xlRowRange.Count;
int colCount = xlcolRange.Count;
for (int i = 2; i <= rowCount; i++)
Range cell1 = xlRange.Cells[i, 1] as Range;
Range cell2 = xlRange.Cells[i, 2] as Range;
Range cell3 = xlRange.Cells[i, 3] as Range;
object val1 = cell1.Value2;
object val2 = cell2.Value2;
object val3 = cell3.Value2;
Employee emp = new Employee();
emp.FirstName = val1.ToString();
emp.LastName = val2.ToString();
emp.EmpID = val3.ToString();
catch (Exception exp)
workBook.Close(false, Type.Missing, Type.Missing);
return empLst;
I think the visitor pattern might be a good fit here. You modify the function you have above to include a parameter called visitor. Then you modify your for loop to pass relevant data to the visitor object:
for (int i = 2; i <= rowCount; i++)
visitor.VisitRow(xlRange.Cells, i);
The visitor.VisitRow() function will extract the data it needs and keeps internally a reference to the extracted objects. You will have different visitors, one for employers, one for sales, customers, etc.
In the end, you will write something like this:
Visitor employerVisitor = new EmployerVisitor();
Visitor salesVisitor = new SalesVisitor();
Parse("workbook-employers.xls", employerVisitor);
Parse("workbook-sales.xls", salesVisitor);
List<Employee> employers = employerVisitor.GetData();
List<Sale> sales = salesVisitor.GetData();
You could expose this from a generic class, along the lines of:
public class ObjectParser<T>
public List<T> ParseObject(string filePath, Func<Range, T> f)
Application _excelApp = null;
Workbooks workBooks = null;
Workbook workBook = null;
Sheets wSheets = null;
Worksheet wSheet = null;
Range xlRange = null;
Range xlRowRange = null;
Range xlcolRange = null;
List<T> lst= new List<T>();
_excelApp = new Application();
workBooks = _excelApp.Workbooks;
workBook = workBooks.Open(filePath, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing);
wSheets = (Sheets)workBook.Sheets;
wSheet = (Worksheet)wSheets.get_Item(1);
xlRange = wSheet.UsedRange;
xlRowRange = xlRange.Rows;
xlcolRange = xlRange.Columns;
int rowCount = xlRowRange.Count;
int colCount = xlcolRange.Count;
for (int i = 2; i <= rowCount; i++)
catch (Exception exp)
workBook.Close(false, Type.Missing, Type.Missing);
return lst;
To use this:
ObjectParser<Employee> op = new ObjectParser<Employee>()
op.Parse(filepath, r => /* insert code to handle Employee here */)
My concern here is that some of the Marshall.ReleaseComObject() calls are pushed onto the lambda that is passed in, which makes that a little heavy-weight. Can you tell us more about the differences in what cells are used between Employee and the other types?
I have re-factored my code to something like this
class ExcelParser : IDisposable
bool disposed = false;
Application _excelApp = null;
Workbooks workBooks = null;
Workbook workBook = null;
Sheets wSheets = null;
Worksheet wSheet = null;
Range xlRange = null;
Range xlRowRange = null;
Range xlcolRange = null;
public bool Load(string filePath)
bool bFlag = true;
_excelApp = new Application();
workBooks = _excelApp.Workbooks;
workBook = workBooks.Open(filePath, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing);
wSheets = (Sheets)workBook.Sheets;
wSheet = (Worksheet)wSheets.get_Item(1);
xlRange = wSheet.UsedRange;
xlRowRange = xlRange.Rows;
xlcolRange = xlRange.Columns;
catch (Exception exp)
return bFlag;
public int GetRowCount()
int rowCount = 0;
if(xlRowRange != null)
rowCount = xlRowRange.Count;
return rowCount;
public string GetValue(int rowIndex, int colIndex)
string value = "";
Range cell = null;
cell = xlRange.Cells[rowIndex, colIndex] as Range;
object val = cell.Value2;
value = val.ToString();
catch (Exception exp)
return value;
protected virtual void Dispose(bool disposing)
if (!this.disposed)
{ // don't dispose more than once
if (disposing)
// disposing==true means you're not in the finalizer, so
// you can reference other objects here
if (workBook != null)
workBook.Close(false, Type.Missing, Type.Missing);
if (_excelApp != null)
if (xlRowRange != null)
if (xlRange != null)
if (xlcolRange != null)
if (wSheet != null)
if (wSheets != null)
if (workBook != null)
if (workBooks != null)
if (_excelApp != null)
this.disposed = true;
public void Dispose()
and the calling code looks like this
public List<Employee> Handle(string filePath)
List<Employee> empLst = new List<Employee>();
ExcelParser exlParser = new ExcelParser();
if (exlParser.Load(filePath))
int rowCount = exlParser.GetRowCount();
for (int i = 2; i <= rowCount; i++)
Employee emp = new Employee();
emp.FirstName = exlParser.GetValue(i, 1);
emp.LastName = exlParser.GetValue(i, 2);
emp.EmpID = exlParser.GetValue(i, 3);
catch (Exception exp)
return empLst;
so now i can reuse the parser in whatever places i wish to use. please comment whether this is correct

