I am having a case in which I need to import an excel file,having two sheets into the DB. I am using a SSIS package for the same. The issue is,I can make the excel sheet dynamic,by setting the expressions but the sheets into the excel workbook are also changing names. How can I also get to make the sheet names more dynamic.
I have tried using Microsoft.Office.InterOp.excel in my DEV code,but the PROD does not have excel installed on it. Can somebody resolve this for me.
Thanks in advance.
Try adding something similar to the script below, which can be found at Code Spot - Dynamic Sheet Name in SSIS Excel Spreadsheet Imports. It doesn't require Excel to be installed on the machine.
string excelFile = null;
string connectionString = null;
OleDbConnection excelConnection = null;
DataTable tablesInFile = null;
int tableCount = 0;
DataRow tableInFile = null;
string currentTable = null;
int tableIndex = 0;
string[] excelTables = null;
excelFile = Dts.Variables["User::ExcelFile"].Value.ToString();
connectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + excelFile + ";Extended Properties=Excel 8.0";
excelConnection = new OleDbConnection(connectionString);
excelConnection.Open();
tablesInFile = excelConnection.GetSchema("Tables");
tableCount = tablesInFile.Rows.Count;
excelTables = new string[tableCount];
foreach (DataRow tableInFile_loopVariable in tablesInFile.Rows)
{
tableInFile = tableInFile_loopVariable;
currentTable = tableInFile["TABLE_NAME"].ToString();
excelTables[tableIndex] = currentTable;
tableIndex += 1;
}
}
Dts.Variables["User::SheetName"].Value = excelTables[0];
Dts.TaskResult = (int)ScriptResults.Success;
I'm using SQL script to load the the Excel file as it is to a staging table, from there it's processed using SSIS / T-SQL, it's relatively fast and reliable. You can download the driver by itself, you don't need office installation to use this method.
/*Drop table if exists*/
IF OBJECT_ID(’table_1', 'U') IS NOT NULL
EXEC ('DROP TABLE table_1')
/*Load using access driver, can probably work with excel too.*/
select *
into [table_1]
from openrowset('MSDASQL'
,'Driver={Microsoft Access Text Driver (*.txt, *.csv)}'
,'select * from 'D:\folder\file.csv' ')
Related
I want to read all data from an xls file using OLEDB, but I don't have any experience in that.
string filename = #"C:\Users\sasa\Downloads\user-account-creation_2.xls";
string connString = #"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + filename + ";Extended Properties='Excel 8.0;HDR=YES'";
using (System.Data.OleDb.OleDbConnection conn = new System.Data.OleDb.OleDbConnection(connString))
{
conn.Open();
System.Data.OleDb.OleDbCommand selectCommand = new System.Data.OleDb.OleDbCommand("select * from [Sheet1$]", conn);
System.Data.OleDb.OleDbDataAdapter adapter = new System.Data.OleDb.OleDbDataAdapter(selectCommand);
DataTable dt = new DataTable();
adapter.Fill(dt);
int counter = 0;
foreach (DataRow row in dt.Rows)
{
String dataA = row["email"].ToString();
// String dataB= row["DataB"].ToString();
Console.WriteLine(dataA + " = ");
counter++;
if (counter >= 40) break;
}
}
I want to read all data from email row
I get this error
'Sheet$' is not a valid name. Make sure that it does not include invalid characters or punctuation and that it is not too long
Well, you don't have a sheet called Sheet1 do you? Your sheet seems to be called "email address from username" so your query should be....
Select * From ['email address from username$']
Also please don't use Microsoft.Jet.OLEDB.4.0 as it's pretty much obsolete now. Use Microsoft.ACE.OLEDB.12.0. If you specify Excel 12.0 in the extended properties it will open both .xls and .xlsx files.
You can also load the DataTable with a single line...
dt.Load(new System.Data.OleDb.OleDbCommand("Select * From ['email address from username$']", conn).ExecuteReader());
To read the names of the tables in the file use...
DataTable dtTablesList = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
foreach (DataRow drTable in dtTablesList.Rows)
{
//Do Something
//But be careful as this will also return Defined Names. i.e ranges created using the Defined Name functionality
//Actual Sheet names end with $ or $'
if (drTable["Table_Name"].ToString().EndsWith("$") || drTable["Table_Name"].ToString().EndsWith("$'"))
{
Console.WriteLine(drTable["Table_Name"]);
}
}
Is it possible to use the Open XML SDK?
https://learn.microsoft.com/en-us/office/open-xml/how-to-retrieve-the-values-of-cells-in-a-spreadsheet
I importing excel to datatable in my asp.net project.
I have below code:
string excelConString = string.Format(
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};" +
"Extended Properties='Excel 8.0;" +
"IMEX=1;TypeGuessRows=0;ImportMixedTypes=Text;'", filepath);
using (OleDbConnection connection = new OleDbConnection(excelConString))
{
connection.Open();
string worksheet;
worksheet = "Sheet 1$";
string connStr;
connStr = string.Format("Select * FROM `{0}`", worksheet);
OleDbDataAdapter daSheet = new OleDbDataAdapter(connStr, connection);
DataSet dataset = new DataSet();
DataTable table;
table = new DataTable();
daSheet.Fill(table);
dataset.Tables.Add(table);
connStr = string.Format("Select * FROM `{0}$`", worksheet);
table = new DataTable();
daSheet.Fill(table);
dataset.Tables.Add(table);
}
When i run above code in order to import excel, last data always missing because last data has special character like below
"İ,Ö,Ş" etc.
How can i solve this problem.I added below code
"IMEX=1;TypeGuessRows=0;ImportMixedTypes=Text;
however it is not working for me.
Any help will be appreciated.
Thank you
Just answer this question for other readers if any.
If you prefer to handle with POCO directly with Excel file, recommend to use my tool Npoi.Mapper, a convention based mapper between strong typed object and Excel data via NPOI.
Get objects from Excel (XLS or XLSX)
var mapper = new Mapper("Book1.xlsx");
var objs1 = mapper.Take<SampleClass>("sheet2");
// You can take objects from the same sheet with different type.
var objs2 = mapper.Take<AnotherClass>("sheet2");
Export objects to Excel (XLS or XLSX)
//var objects = ...
var mapper = new Mapper();
mapper.Save("test.xlsx", objects, "newSheet", overwrite: false);
Put different types of objects into memory workbook and export together.
var mapper = new Mapper("Book1.xlsx");
mapper.Put(products, "sheet1", true);
mapper.Put(orders, "sheet2", false);
mapper.Save("Book1.xlsx");
This question already has answers here:
Reading multiple excel sheets with different worksheet names
(2 answers)
Closed 8 years ago.
Using c# I can successfully open an excel document and read the data in the first worksheet with the code below. However, my .xlsx has multiple worksheets so I would like to loop through the worksheet collection rather than hard coding the name of each worksheet. Many thanks.
string path = #"C:\Extract\Extract.xlsx";
string connStr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + path + ";Extended Properties=Excel 12.0;";
string sql = "SELECT * FROM [Sheet1$]";
using (OleDbDataAdapter adaptor = new OleDbDataAdapter(sql, connStr))
{
DataSet ds = new DataSet();
adaptor.Fill(ds);
DataTable dt = ds.Tables[0];
}
I used most of the code in the answer here [Reading multiple excel sheets with different worksheet names that was kindly pointed out to me in a comment on my question.
It wouldn't compile for me in VS 2013 though as the DataRow object does not have have the property Item (- r.Item(0).ToString in that code). So I just changed that little bit. It also brought back some worksheet that had Print_Area in its name which wasn't valid so I took that out of my loop. Here is the code as it worked for me.
string path = #"C:\Extract\Extract.xlsx";
string connStr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + path + ";Extended Properties=Excel 12.0;";
DataTable sheets = GetSchemaTable(connStr);
string sql = string.Empty;
DataSet ds = new DataSet();
foreach (DataRow dr in sheets.Rows)
{ //Print_Area
string WorkSheetName = dr["TABLE_NAME"].ToString().Trim();
if (!WorkSheetName.Contains("Print_Area"))
{
sql = "SELECT * FROM [" + WorkSheetName + "]";
ds.Clear();
OleDbDataAdapter data = new OleDbDataAdapter(sql, connStr);
data.Fill(ds);
DataTable dt1 = ds.Tables[0];
foreach (DataRow dr1 in dt1.Rows)
{
//parsing work
}
}
}
static DataTable GetSchemaTable(string connectionString)
{
using (OleDbConnection connection = new
OleDbConnection(connectionString))
{
connection.Open();
DataTable schemaTable = connection.GetOleDbSchemaTable(
OleDbSchemaGuid.Tables,
new object[] { null, null, null, "TABLE" });
return schemaTable;
}
}
I'm about to work on almost the same problem.
I found the guide at http://www.dotnetperls.com/excel quite useful.
In short, to open worksheet no. 3, add the following code after opening the excel workbook:
var worksheet = workbook.Worksheets[3] as
Microsoft.Office.Interop.Excel.Worksheet;
Hope this answered your question.
I'd recommend using EPPlus (available via Nuget https://www.nuget.org/packages/EPPlus/ ) it's a great wrapper tool for working with .xlsx spreadsheets in .Net .In it worksheets are a collection and so you can do what you want by just looping round them, regardless of name or index.
For example,
using (ExcelPackage package = new ExcelPackage(new FileInfo(sourceFilePath)))
{
foreach (var excelWorksheet in package.Workbook.Worksheets)
...
}
You should try the Open XML Format SDK (Nuget: Link). The link below explains both reading and writing Excel documents:
http://www.codeproject.com/Articles/670141/Read-and-Write-Microsoft-Excel-with-Open-XML-SDK
Oh by the way, office doesn't have to be installed to use...
I am using Asp.net with C#. I need to import data from an Excel sheet to a DataTable. The sheet has 100,000 records with four columns: Firstname, LastName, Email,Phone no.
How can I do this?
Use the following code:
public static DataTable exceldata(string filePath)
{
DataTable dtexcel = new DataTable();
bool hasHeaders = false;
string HDR = hasHeaders ? "Yes" : "No";
string strConn;
if (filePath.Substring(filePath.LastIndexOf('.')).ToLower() == ".xlsx")
strConn = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + filePath + ";Extended Properties=\"Excel 12.0;HDR=" + HDR + ";IMEX=0\"";
else
strConn = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + filePath + ";Extended Properties=\"Excel 8.0;HDR=" + HDR + ";IMEX=0\"";
OleDbConnection conn = new OleDbConnection(strConn);
conn.Open();
DataTable schemaTable = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, new object[] { null, null, null, "TABLE" });
//Looping Total Sheet of Xl File
/*foreach (DataRow schemaRow in schemaTable.Rows)
{
}*/
//Looping a first Sheet of Xl File
DataRow schemaRow = schemaTable.Rows[0];
string sheet = schemaRow["TABLE_NAME"].ToString();
if (!sheet.EndsWith("_"))
{
string query = "SELECT * FROM [" + sheet3 + "]";
OleDbDataAdapter daexcel = new OleDbDataAdapter(query, conn);
dtexcel.Locale = CultureInfo.CurrentCulture;
daexcel.Fill(dtexcel);
}
conn.Close();
return dtexcel;
}
Source: http://www.codeproject.com/Questions/445400/Read-Excel-Sheet-Data-into-DataTable
You may also refer the following question: Importing Excel into a DataTable Quickly if you wish to import faster.
I'm not sure if this will work in ASP.NET but it works in WPF so maybe there's something you can take from it?
Anyway, at the global scope:
Microsoft.Office.Interop.Excel.Application xls;
Then to select and read a spreadsheet:
private void readSheet()
{
// Initialise and open file picker
OpenFileDialog openfile = new OpenFileDialog();
openfile.DefaultExt = ".xlsx";
openfile.Filter = "Office Files | *xls;.xlsx";
var browsefile = openfile.ShowDialog();
if (browsefile == true)
{
string path = openfile.FileName;
xls = new Microsoft.Office.Interop.Excel.Application();
// Dynamic File Using Uploader... Note the readOnly flag is true
Workbook excelBook = xls.Workbooks.Open(path, 0, true, 5, "", "", true, XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);
Worksheet excelSheet = (Worksheet)excelBook.Worksheets.get_Item(1); ;
Range excelRange = excelSheet.UsedRange;
// Make default cell contents
string strCellData = String.Empty;
double douCellData;
// Initialise row and column
int rowCnt, colCnt = 0;
// Initialise DataTable
System.Data.DataTable dt = new System.Data.DataTable();
// Loop through first row of columns to make header
for (colCnt = 1; colCnt <= excelRange.Columns.Count; colCnt++)
{
string strColumn = "";
strColumn = Convert.ToString((excelRange.Cells[1, colCnt] as Range).Value2);
var Column = dt.Columns.Add();
Column.DataType = Type.GetType("System.String");
// Check & rename for duplicate entries
if (dt.Columns.Contains(strColumn))
Column.ColumnName = (strColumn + ", " + colCnt);
else
Column.ColumnName = strColumn;
}
dt.AcceptChanges();
// Fill in the rest of the cells
for (rowCnt = 2; rowCnt <= excelRange.Rows.Count; rowCnt++)
{
string strData = "";
for (colCnt = 1; colCnt <= excelRange.Columns.Count; colCnt++)
{
try
{
strCellData = Convert.ToString((excelRange.Cells[rowCnt, colCnt] as Range).Value2);
strData += strCellData + "|";
}
catch (Exception ex)
{
douCellData = (excelRange.Cells[rowCnt, colCnt] as Range).Value2;
strData += douCellData.ToString() + "|";
Console.Write(ex.ToString());
}
}
strData = strData.Remove(strData.Length - 1, 1);
dt.Rows.Add(strData.Split('|'));
}
dtGrid.ItemsSource = dt.DefaultView;
try
{
excelBook.Close(true, null, null);
}
catch (System.Runtime.InteropServices.COMException comEX)
{
Console.Write("COM Exception: " + comEX.ToString());
}
xls.Quit();
}
}
Speed Problems?
I'll note several ways to do this:
ODBC (answered here)
Interop (answered here)
These have drawbacks, they might not be fast; Interop requires excel, runs it, and can cause lots of problems with re-running it or the web server trying to run it.
please try #milan_m solution first. If it has problems come back here.
So some faster, potentially better solutions are as such.
NPOI
Save-As CSV
NPOI is available as a NuGet for C# and will read excel files very well. Yes this is a product recommendation, but you didn't ask for one. It is a well-maintained project and will be relevant for readers into the 2030s.
https://github.com/nissl-lab/npoi
You'll want to use NPOI if ODBC is too slow, and your users are uploading a different XLSX file as part of the use case. Unless there are only 2 or 3 internal power users you are in contact with, then you can require them to upload it as CSV.
What if the use case is: You just use one .XLSX file that's the same for all users, you deploy it with the app?
You didn't mention if this is the case or not and it makes a HUGE difference. you definitely will be miles ahead if you save as csv and consume that from the startup of the program. Or, if you need it in a datatable, import it to the data table at dev time and save it to XML file using a method on the dataset object (you have to put the tbl into a set to save as XML I believe ... many examples abound).
If you need to do super flexible lookups, a datatable (or object collection and linq-to-objects) is good.
However if you have to look up items at extreme speed, and not via ranges but just by exact match, load it to a dictionary or dictionaries at startup time, from a CSV or similar.
I did this for a power user who was searching a spreadsheet of about 2-3lakh / records with interop+excel ... operation went from 90 minutes to 30 seconds. Literally. Dictionary is about the fastest way to look stuff up in .Net, if you can fit that stuff in memory, that is, and don't have to reload different data all the time (so keep your RDBMS).
Oops.
Just now saw this question is 7 years old. ##$^##!!!
So I'm working on this thing for work, that converts an excel list of instructions into a better looking, formatted word document. I've been connecting to the excel document and then storing the file into a datatable for easier access.
I had just finally gotten the borders and stuff right for my word document when i started getting an error:
External table is not in the expected format.
Here is the full connection algorithm:
public static DataTable getWorkSheet(string excelFile =
"C:\\Users\\Mitch\\Dropbox\\Work tools\\Excel for andrew\\Air Compressor PM's.xlsx") {
string connection = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + excelFile
+ ";Extended Properties='Excel 8.0;HDR=YES;'";
string sql = null;
string worksheetName = null;
string[] Headers = new string[4];
DataTable schema = null;
DataTable worksheet = null;
DataSet workbook = new DataSet();
//Preparing and opening connection
OleDbConnection objconn = new OleDbConnection(connection);
objconn.Open();
//getting the schema data table
schema = objconn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
worksheetName = schema.Rows[0]["Table_Name"].ToString();
//Each worksheet will have a varying name, so the name is just called from
//the dataTable.rows array. Can be later modified to use multiple
//worksheets within a workbook.
sql = "SELECT * FROM[" + worksheetName + "]";
//data adapter
OleDbDataAdapter objAdapter = new OleDbDataAdapter();
//pass the sql
objAdapter.SelectCommand = new OleDbCommand(sql, objconn);
//populate the dataset
objAdapter.Fill(workbook);
//Remove spaces from the headers.
worksheet = workbook.Tables[0];
for (int x = 0; x < Headers.Count(); x++) {
Headers[x] = worksheet.Columns[x].ColumnName;
worksheet.Columns[x].ColumnName = worksheet.Columns[x].ColumnName.Replace(" ", "");
}
return worksheet;
}//end of getWorksheet
EDIT: i pulled up my old code from dropbox previous versions that was definetly working as well as redownload a copy of the excel doc i know was working..... what gives? has something changed in my computer?
You are connecting to a 2007/2010 Excel file (*.xlsx, *.xlsm). You need the updated 2010 drivers (Ace), which can be downloaded for free. The correct connection string can be obtained from http://connectionstrings.com/Excel and http://connectionstrings.com/Excel-2007