Read Excel cell values with SSIS script task - c#

I am trying to read an Excel file via a SSIS ScriptTask to check for certain cell values in that worksheet.
In the code example you can see that the strSQL is set to "H4:H4" to only read one cell. This cell can only have a true or false value.
Since I also need to check for a certain string value in B1 I wanted to extend this version.
string filePath = "c:\\test\\testBoolean.XLSX";
string tabName = "testSheet$";
string strSQL = "Select * From [" + tabName + "H4:H4]";
String strCn = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source="
+ filePath + ";Extended Properties=\"Excel 12.0;HDR=NO;IMEX=1\";";
OleDbConnection cn = new OleDbConnection(strCn);
int iCnt = 0;
OleDbDataAdapter objAdapter = new OleDbDataAdapter(strSQL, cn);
DataSet ds = new DataSet();
objAdapter.Fill(ds, tabName);
DataTable dt = ds.Tables[tabName];
foreach (DataRow row in dt.Rows)
{
iCnt = iCnt + 1;
// some processing....
}
What I don't understand is why I get a boolean value with the above strSQL statement or with any statment containing the same row number like so:
string strSQL = "Select * From [" + tabName + "F4:H4]";
Debug-Output:
row.ItemArray[2] false object {bool}
But when I set a different range like this one:
string strSQL = "Select * From [" + tabName + "F1:H4]";
I loose the recognition of the bool value:
row.ItemArray[2] "FALSE" object {string}
I'd much rather like to use the bool value for other processing tasks.
How can I fix this in addition to also reading the B2 value?

Your connection string specified IMEX=1, which tells the driver to treat intermixed data types as text. (See the "Usage Considerations" section of the MSDN article Excel Connection Manager.)
Thus, when you specified a single row
string strSQL = "Select * From [" + tabName + "F4:H4]";
there was only one possible data type for the third column, and the driver was able to correctly infer it. However, when you specified multiple rows
string strSQL = "Select * From [" + tabName + "F1:H4]";
and any value in the range H1:H4 was not a bool, the driver translated all values in that column to strings.
Assuming that you do in fact have mixed data types in column H and only care about the values in two particular cells, the simplest solution is to query each cell individually. See Import a single Excel cell into SSIS for some ideas on how to do that.

I would clone most of the code to produce two separate SELECT statements to query the two different cells you are after with separate SQL statements.
Actually I would probably go further and shred the whole script into SSIS components e.g. Execute SQL Tasks or Data Flow Tasks.

Related

Using an MS Access Lookup field in C#

I am building an extension to an existing Access database and an accompanying front end programmed in C#. The original Access database was not designed very well and certainly not designed with future expansion in mind. For simplicity's sake, lets say the legacy DB has 2 tables: tblEmployee [empId(AutoNumber), empName(Text)] and tblProjects [prjId(AutoNumber), prjName(Text), prjEmps(Number/Lookup)]. Both tables have an AutoNumber primary key. The Projects table has a multi-value lookup field that allows users to assign multiple employees to a project. When I query the tblProjects table in Access SELECT prjId, prjName, prjEmps FROM tblProject;, the prjEmps field lists all the employees' names separated by commas. However, the problem is when I use the same query in C#, the prjEmps returns a string version of a number that is not the empId of the employee(s). I am not sure if it makes a difference, but I am using the System.Data.OleDb and System.Data namespaces in C#. Here is the gist my C# code:
string connStr = #"Provider = Microsoft.ACE.OLEDB.12.0; " +
#"Data Source=" + dbFilePath;
string query = "SELECT prjId, prjName, prjEmps FROM tblProject;";
OleDbConnection dbConn = new OleDbConnection(connStr);
OleDbCommand Cmd = new OleDbCommand(query, dbConn);
OleDbDataAdapter adp = new OleDbDataAdapter(query, dbConn);
DataTable dt = new DataTable();
adp.Fill(dt);
dbConn.Close();
foreach (DataRow row in dt.Rows)
{
int prjId = row.Field<int>("prjId");
string prjName = row.Field<string>("prjName");
string prjEmps = row.Field<string>("prjEmps");
MessageBox.Show("Project ID: " + prjId.ToString() + "\n" +
"Project Name: " + prjName + "\n" +
"Employees: " + prjEmps);
}
I would be happy if I could just get the concatenated list of names, but I would prefer an array of integer keys or the like. Any ideas on how to fix this?
Use ODBC provider, OLEDB does not supports multi-value lookup field and you get garbage values if you use it to read multi-value lookup field , using ODBC you will get ";" separated values which can then be split into individual values or replace with ",".

Reading Excel with OleDbDataReader - cannot read values from a specific column

I'm working on an existing C# code reading Excel file with OleDbDataReader. But I can't have the content for the cells in two specific columns.
This is the connection code:
connection = new OleDbConnection(#"Provider=Microsoft.ACE.OLEDB.12.0;Data Source="
+ pathExcel + ";Extended Properties=\"Excel 12.0;HDR=YES;IMEX=1\";");
connection.Open();
And to access the content of the default sheet:
tables = connection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
sheetsNames = from DataRow row in tables.Rows select row["TABLE_NAME"].ToString();
sql = "SELECT * FROM [" + sheetsNames.FirstOrDefault() + "];";
ocmd = new OleDbCommand(sql, connection);
reader = ocmd.ExecuteReader(); //OleDbDataReader
So, finally I read all the content, but for some columns I can't access cells content (reader["mycolumn"]). So, I tried this:
while (reader.Read()){
// Test code, I tried different ways to read cell content
// It's working
string colName = reader.GetName(26);
string val1 = reader[colName].ToString();
string val2 = reader.GetValue(26).ToString();
// Same code, changing index 26 to 27
... // always empty values. Bug ??
}
If I evaluate expressions "reader.GetValue(26)" it returns the expected value, but when it's "reader.GetValue(27)" it's returns an exception ("This expression causes side effects and will not be evaluated"), in particular it's like an index out of range exception. But I can read data from next columns (29, 30...).
Do you have any idea about the cause ?

Data is missing while reading excel file using OLEDB

I am using OLEDB to read excel file into datatable. But the problem is, some values are missing(Empty). In my excel sheet one column datatype is General, it has mixed values like string and integers. Most of the cell values are integers. Why OLEDB is skipping string values.
OleDbConnection connection = new OleDbConnection();
connection.ConnectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + filePath + "; Extended Properties=\"Excel 12.0;IMEX=1\";";
OleDbCommand myAccessCommand = new OleDbCommand();
myAccessCommand.CommandText = "Select * from [" + sheetName + "]";
OleDbDataAdapter myDataAdapter = new OleDbDataAdapter(myAccessCommand);
myDataAdapter.Fill(myDataSet);
Check following link and see points under "RESOLUTION":
http://support.microsoft.com/kb/194124
Please see point 2 NOTE.
Setting IMEX=1 is entirely dependent on your registry settings. By default, first 8 rows are checked to determine the data type. IMEX=1 can give unpredictable behaviors, such as skipping string values. There is also one small workaround for this problem. Just add single quote (') before every cell value in excel. Every cell will be treated as string.
Add IMEX=1 to the connection string as below:
string con = string.Format(#"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};" + #"Extended Properties='Excel 8.0;HDR=Yes;IMEX=1'", fileName);

How do I sum all rows of a specific header in an excel file with c#?

I have a excel table with 1 sheet. That sheet has headers in row 1.
One of the headers is Amount.
I want to read all rows from that header and get the sum of it independently of the number or rows, which is never the same, into a variable of type float.
I'm doing this with c#.
I open the workbook, I get the active sheet and then nothing, I get blocked.
How do I go about this?
Rui Martins
You could use OleDB instead of Excel.Interop
string con = #"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=D:\test.xls;" +
"Extended Properties='Excel 8.0;HDR=Yes;'";
using(OleDbConnection c = new OleDbConnection(con))
{
c.Open();
string selectString = "SELECT SUM(Amount) FROM [Sheet1$]";
using(OleDbCommand cmd1 = new OleDbCommand(selectString))
{
cmd1.Connection = c;
var result = cmd1.ExecuteScalar();
Console.WriteLine(result);
}
}
This example use the old Microsoft.Jet.OleDB.4.0 provider, but works equally with the new Microsoft.ACE.OLEDB.12.0
Take a look at this article, you should be able to loop through the rows and get the total by adding the cell values altogether.
MSDN article on how to retrieve excel cell values.

OleDB Can't Retrieve Rows With Different DataType

I am trying to retrieve DataTable from .xls file. Below are my code:
OleDbConnection MyConnection = null;
DataSet DtSet = null;
OleDbDataAdapter MyCommand = null;
MyConnection = new OleDbConnection("provider=Microsoft.Jet.OLEDB.4.0; Data Source='" + path + "';Extended Properties=Excel 8.0;");
//path is where the .xls file located
ArrayList TblName = new ArrayList();
MyConnection.Open();
DataTable schemaTable = MyConnection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, new object[] { null, null, null, "TABLE" });
foreach (DataRow row in schemaTable.Rows)
{
TblName.Add(row["TABLE_NAME"]);
}
MyCommand = new System.Data.OleDb.OleDbDataAdapter("select * from [" + TblName[0].ToString() + "] order by Material", MyConnection);
DtSet = new System.Data.DataSet();
MyCommand.Fill(DtSet);
MyCommand.FillSchema(DtSet, SchemaType.Source);
DataTable dt = new DataTable();
dt = DtSet.Tables[0];
MyConnection.Close();
Problem is: I have some inconsistent rows in my table, meaning they don't follow the other rows datatype.
Let's say in column A, I have cells that are supposed to be like:
105161610
146161701
196171717
.........
Meaning to say it's supposed to be of Int32 datatype.
These are the majority of the column cells..
I also have some other cells (still in the same column) that look like:
ABC9012
KDJ0981
KLP0001
.......
They somehow follow string datatype.
When I execute the code, I can only Select cells of int type while cells having the other type (string) is set to null instead. Although in my code I basically set the select * explicitly.
Can someone advise me on how to consistently retrieve both kind of datatype (instead of only 1 like what happens now)?
You have to cast or convert both types of data to SQL equivalent of string like varchar.
Try either one of the following:
1. select cast(Column_A as varchar) Column_A from TableName order by Material
2. select convert(varchar, Column_A) Column_A from TableName order by Material
Add excel connection string IMEX=1; HDR={1} like full sting below
Description : IMEX=1 You can force mixed data to be converted to text
HDR={1} indicates that the first row contains column names, not data header row if you dont want then put No

Categories

Resources