C# reading Excel missing column - c#

I have excel (.xls) file with 4 column and all columns are "general" and with no header.
Column A and B contains some text, column C contains numbers, column D contains text.
When I read data (code below) reader can see only 3 columns (A,B,C), reader dose not see (recognize) column D (picture below).
I have added IMEX=1 in connection string, I have changed registry - TypeGuessRows set it to 0 (image below), but all of this did not help me - still have same problem.
Does anyone know why after column that contains numbers JET dose not see txt column?
string sourceFileServer = #"C:\TEST\Excel1.xls";
using (OleDbConnection connection = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + sourceFileServer + ";Extended Properties=\"Excel 8.0;HDR=NO;IMEX=1\";"))
{
OleDbCommand command = new OleDbCommand("SELECT * FROM [Sheet1$];", connection);
connection.Open();
OleDbDataReader reader = command.ExecuteReader();
int _row = 0;
while (reader.Read())
{
_row++;
try
{
string _v0 = ((string)reader[0]).Trim();
string _v1 = ((string)reader[1]).Trim();
string _v2 = reader[2].ToString().Trim();
string _v3 = ((string)reader[3]).Trim();
MessageBox.Show(_v0 + "|" + _v1 + "|" + _v2 + "|" + _v3 + "|");
}
catch (Exception xcp)
{
MessageBox.Show(xcp.Message);
}
break;
}
reader.Close();
}
reader
registry

Related

Query string selections for XLS spreadsheet C#

I am trying to grab cells in XLS spreadsheets, assign them to string arrays, then manipulate the data and export to multiple CVS files.
The trouble is the XLS spreadsheet contains information that is not relevant, useable data doesn't start till row 17 and columns have no headings with just the default Sheet1.
I have looked at related questions and tried figuring it out myself with no success. The following code to read the XLS kinda works but is messy to work with as the row lengths vary from one XLS file to another and it is automatically pulling empty columns and rows.
CODE
public static void xlsReader()
{
string fileName = string.Format("{0}\\LoadsAvailable.xls", Directory.GetCurrentDirectory());
string connectionString = #"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + fileName + ";" + #"Extended Properties='Excel 8.0;HDR=Yes;'";
string queryString = "SELECT * FROM [Sheet1$]";
using (OleDbConnection connection = new OleDbConnection(connectionString))
{
OleDbCommand command = new OleDbCommand(queryString, connection);
connection.Open();
OleDbDataReader reader = command.ExecuteReader();
int counter = 0;
while (reader.Read())
{
Console.WriteLine("Line " + counter + ":" + reader[28].ToString()); // Just for testing
counter++;
}
}
}
I could do a bunch of trickery with loops to get the data that is required but there has to be a query string that could get the data from row 17 with only 8 columns(not 26 columns with 18 empty)?
I have tried many query string examples and can not seam to get any to work with a starting row index or filter out the empty data.
Here is a handy method that converts an excel file to a flat file.
You may want to change the connection string properties to suit your needs. I needed headers for my case.
Note you will need the Access database engine installed on your machine. I needed the 32 bit version since the app i dev'd was 32 bit. I bet you will also need it.
I parameterized the delimiter for the flat file, because I had cases where I didn't need a comma but a pipe symbol.
How to call method ex: ConvertExcelToFlatFile(openFileName, savePath, '|'); // pipe delimited
// Converts Excel To Flat file
private void ConvertExcelToFlatFile(string excelFilePath, string csvOutputFile, char delimeter, int worksheetNumber = 1)
{
if (!File.Exists(excelFilePath)) throw new FileNotFoundException(excelFilePath);
if (File.Exists(csvOutputFile)) throw new ArgumentException("File exists: " + csvOutputFile);
// connection string
var cnnStr = String.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"Excel 12.0 Xml; IMEX=1; HDR=NO\"", excelFilePath);
var cnn = new OleDbConnection(cnnStr);
// get schema, then data
var dt = new DataTable();
try
{
cnn.Open();
var schemaTable = cnn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
if (schemaTable.Rows.Count < worksheetNumber) throw new ArgumentException("The worksheet number provided cannot be found in the spreadsheet");
string worksheet = schemaTable.Rows[worksheetNumber - 1]["table_name"].ToString().Replace("'", "");
string sql = String.Format("select * from [{0}]", worksheet);
var da = new OleDbDataAdapter(sql, cnn);
da.Fill(dt);
}
catch (Exception e)
{
throw e;
}
finally
{
// free resources
cnn.Close();
}
// write out CSV data
using (var wtr = new StreamWriter(csvOutputFile)) // disposes file handle when done
{
foreach (DataRow row in dt.Rows)
{
//MessageBox.Show(row.ItemArray.ToString());
bool firstLine = true;
foreach (DataColumn col in dt.Columns)
{
// skip the first line the initial
if (!firstLine)
{
wtr.Write(delimeter);
}
else
{
firstLine = false;
}
var data = row[col.ColumnName].ToString();//.Replace("\"", "\"\""); // replace " with ""
wtr.Write(String.Format("{0}", data));
}
wtr.WriteLine();
}
}
}

While reading Excel sheet if the value of column is numberic then it returns null in datatable

I am reading values from Excel sheet. Columns generally contain String but sometimes it may contain numeric values. While reading Excel sheet into datatable numeric value is read as blank.
It reads 90004 as null, But if I sort this column by numeric the it reads numeric value and gives string value as null and if I sort this column by string then it reads string value and gives numeric as null.
AC62614 abc EA MISC
AC62615 pqr EA MISC
AC62616 xyz EA MISC
AC62617 test EA 90004
AC62618 test3 TO MISC
AC62619 test3 TO STEEL
my code:
public static DataTable ReadExcelFile(FileUpload File1, string strSheetName)
{
string strExtensionName = "";
string strFileName = System.IO.Path.GetFileName(File1.PostedFile.FileName);
DataTable dtt = new DataTable();
if (!string.IsNullOrEmpty(strFileName))
{
//get the extension name, check if it's a spreadsheet
strExtensionName = strFileName.Substring(strFileName.IndexOf(".") + 1);
if (strExtensionName.Equals("xls") || strExtensionName.Equals("xlsx"))
{
/*Import data*/
int FileLength = File1.PostedFile.ContentLength;
if (File1.PostedFile != null && File1.HasFile)
{
//upload the file to server
//string strServerPath = "~/FolderName";
FileInfo file = new FileInfo(File1.PostedFile.FileName);
string strServerFileName = file.Name;
string strFullPath = HttpContext.Current.Server.MapPath("UploadedExcel/" + strServerFileName);
File1.PostedFile.SaveAs(strFullPath);
//open connection out to read excel
string strConnectionString = string.Empty;
if (strExtensionName == "xls")
strConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source="
+ strFullPath
+ ";Extended Properties=\"Excel 8.0;HDR=Yes;IMEX=2\"";
else if (strExtensionName == "xlsx")
strConnectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source="
+ strFullPath
+ ";Extended Properties=\"Excel 12.0;HDR=Yes;IMEX=2\"";
if (!string.IsNullOrEmpty(strConnectionString))
{
OleDbConnection objConnection = new OleDbConnection(strConnectionString);
objConnection.Open();
DataTable oleDbSchemaTable = objConnection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
bool blExists = false;
foreach (DataRow dtr in oleDbSchemaTable.Rows)
{
//reads from the spreadsheet called 'Sheet1'
if (dtr["TABLE_NAME"].ToString() == "" + strSheetName + "$")
{
blExists = true;
break;
}
}
if (blExists)
{
OleDbCommand objCmd = new OleDbCommand(string.Format("Select * from [{0}$]", strSheetName), objConnection);
OleDbDataAdapter objAdapter1 = new OleDbDataAdapter();
objAdapter1.SelectCommand = objCmd;
DataSet objDataSet = new DataSet();
objAdapter1.Fill(objDataSet);
objConnection.Close();
dtt = objDataSet.Tables[0];
}
}
}
}
}
return dtt;
}
If you change IMEX=2 in you connectionstring to IMEX = 1 the columns will be interpreted as text. Then you can get all data of the Sheet and check with Int32.TryParse() if the value is numeric or not.
Connecting string:
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + Path + ";Extended Properties=\"Excel 8.0;HDR=Yes;IMEX=1\"
IMEX=1 For Read All value as Text from Excel file
HDR =Yes For Excel First row as column read
IMEX=1 not use Than Numeric value will be NULL read

Search a row from a table which starts with asterisk symbol

I am uploading an excel file by using c# and i am selecting a column named LOT from that excel file.In column LOT one row is having number starting with asterisk symbol(*). I need a query to select all the rows. please Help me out in this.
I tried the below sample code but it is not working fine. Its returning the lots which doesn't have asterisk symbol(*).
protected void btnUpload_Click(object sender, EventArgs e)
{
//Get path from web.config file to upload
string FilePath = ConfigurationManager.AppSettings["FilePath"].ToString();
string filename = string.Empty;
//To check whether file is selected or not to uplaod
if (BrowseFile.HasFile)
{
try
{
string[] allowdFile = { ".xls", ".xlsx" };
//Here we are allowing only excel file so verifying selected file pdf or not
string FileExt = System.IO.Path.GetExtension(BrowseFile.PostedFile.FileName);
//Check whether selected file is valid extension or not
bool isValidFile = allowdFile.Contains(FileExt);
if (!isValidFile)
{
lblMsg.ForeColor = System.Drawing.Color.Red;
lblMsg.Text = "Please upload only Excel";
}
else
{
// Get size of uploaded file, here restricting size of file
int FileSize = BrowseFile.PostedFile.ContentLength;
if (FileSize <= 1048576)//1048576 byte = 1MB
{
//Get file name of selected file
filename = Path.GetFileName(Server.MapPath(BrowseFile.FileName));
//Save selected file into server location
BrowseFile.SaveAs(Server.MapPath(FilePath) + filename);
//Get file path
string filePath = Server.MapPath(FilePath) + filename;
//Open the connection with excel file based on excel version
OleDbConnection con = null;
if (FileExt == ".xls")
{
con = new OleDbConnection(#"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + filePath + ";Extended Properties=Excel 8.0;");
}
else if (FileExt == ".xlsx")
{
con = new OleDbConnection(#"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + filePath + ";Extended Properties=Excel 12.0;");
}
con.Open();
//Get the list of sheet available in excel sheet
DataTable dt = con.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
//Get first sheet name
string getExcelSheetName = dt.Rows[0]["Table_Name"].ToString();
//Select rows from first sheet in excel sheet and fill into dataset
//OleDbCommand ExcelCommand = new OleDbCommand(#"SELECT LOT FROM [" + getExcelSheetName + #"] WHERE LOT IS NOT NULL", con);
OleDbCommand ExcelCommand = new OleDbCommand(#"SELECT LOT FROM [" + getExcelSheetName + #"] WHERE LOT LIKE '**'", con);
OleDbDataAdapter ExcelAdapter = new OleDbDataAdapter(ExcelCommand);
DataTable ExcelDataSet = new DataTable();
ExcelAdapter.Fill(ExcelDataSet);
//Holding the data into the list
List<DataRow> strLot = ExcelDataSet.AsEnumerable().ToList();
con.Close();
//string lots = "";
//seperating the each lot with a comma separator
for (int i = 0; i < strLot.Count; i++)
{
totLots += ExcelDataSet.Rows[i].ItemArray.GetValue(0).ToString() + ",";
}
//Removing the last comma separator
totLots = totLots.Remove(totLots.Length - 1);
You can use the LIKE operator
WHERE <Column> LIKE '\**'
Also, depending on your data source, the wildcard '*' could be replaced with '%' and 'LIKE' keyword could be replaced with 'ALIKE'
with '%' the search text will become '*%' instead of '\**'
Edit: Code below checked with .xls and .xlsx files
OleDbCommand ExcelCommand = new OleDbCommand(#"SELECT LOT FROM [" + getExcelSheetName + #"] WHERE LOT LIKE '*%'", con);
Change your command text to the following...
#"SELECT [LOT] FROM [" + getExcelSheetName + #"] WHERE [LOT] LIKE '\*%')"
This should preform a simple pattern match like a SQL script and look for any member of the column starting with an asterisk and containing anything at all after that.

Import from Excel in C#

I use following code to save data from excel file into table
private void RetrieveAndStoreExcelData(String filePath)
{
String excelConnectionStr = "Provider=Microsoft.Jet.OLEDB.4.0;" + "Data Source='" + filePath + "'; Extended Properties='Excel 8.0;'";
OleDbConnection excelConnection = new OleDbConnection(excelConnectionStr);
excelConnection.Open();
try
{
//Get the name of the first worksheet
DataTable schema = excelConnection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
if (schema == null || schema.Rows.Count < 1)
{
throw new Exception("Error: Could not determine the name of the first worksheet.");
}
string firstSheetName = schema.Rows[0]["TABLE_NAME"].ToString();
//Retrieve data from worksheet into reader
string query = "SELECT * FROM [" + firstSheetName + "]";
OleDbCommand command = new OleDbCommand(query, excelConnection);
OleDbDataReader dbReader = command.ExecuteReader(); //IEnumerable
//populate IEnumerable
if (dbReader.HasRows)
{
populateRecords(dbReader);
}
}
finally
{
excelConnection.Close();
}
}
This works fine. But if one of the fields length is greater than 255 characters then it truncates the string to 255 and that is also when that row appears after 10th row in excel sheet.
So, if first 10 rows is having length less than 255, it assumes that all rows will have length less than 255 characters.
So is there any way to solve this?

Use interop and c# to count the rows in an Excel spreadsheet's worksheet with data in

I have just written what has to be considered utterly hideous code to count the rows that contain data in the worksheets called "Data" from all the spreadsheets in a given directory. Here's the code
private const string _ExcelLogDirectoryPath = #"..\..\..\..\Model\ExcelLogs\";
static void Main()
{
var excelLogPaths = Directory.GetFiles(_ExcelLogDirectoryPath, "*.xl*");
var excel = new Microsoft.Office.Interop.Excel.Application();
var excelRowCounts = new Dictionary<string, int>();
foreach (var filePath in excelLogPaths)
{
var spreadsheet = excel.Workbooks.Open(Path.GetDirectoryName(System.Windows.Forms.Application.ExecutablePath) + "/" + filePath);
var worksheet = spreadsheet.Sheets["Data"] as Worksheet;
if (worksheet != null)
{
// var rowCount = UsedRange.Rows.Count - 1; DOES NOT WORK, THE number is bigger than the 'real' answer
var rowCount = 0;
for (var i = 1 ; i < 1000000000; i++)
{
var cell = worksheet.Cells[i, 1].Value2; // "Value2", great name for a property, thanks guys
if (cell != null && cell.ToString() != "") // Very fragile (e.g. skipped rows will break this)
{
rowCount++;
}
else
{
break;
}
}
var name = spreadsheet.Name.Substring(spreadsheet.Name.IndexOf('p'), spreadsheet.Name.IndexOf('.') - spreadsheet.Name.IndexOf('p'));
excelRowCounts.Add(name, rowCount - 1);
}
}
I cannot believe this is the right way to do this. It is crazy slow and includes calls to properties with names like Value2 that do not feel like an intended part of a public API. But the method suggested elsewhere dramatically over reports the number of rows (with data in them).
What is the correct was to count the rows with data in them from an Excel worksheet?
========== EDIT 1 ==========
The reason that both UsedRange.Rows.Count and Sid's ACE.OLEDB solution over report the number of rows appears to be a pink background colour that is applied to some of the columns (but only extending to row 7091). Is there a simple/elegant way to count the rows with data in them (i.e. with non-null cell values) regardless of the display colour?
========== EDIT 2 ===========
Sid's ACE.OLEDB solution with the addition he suggests so that the tSQL line reads
var sql = "SELECT COUNT (*) FROM [" + sheetName + "$] WHERE NOT F1 IS NULL";
works. I'll mark that as the answer.
This should do the trick. You can call it with each filename to retrieve the number of rows.
private string GetNumberOfRows(string filename, string sheetName)
{
string connectionString;
string count = "";
if (filename.EndsWith(".xlsx"))
{
connectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + filename + ";Mode=ReadWrite;Extended Properties=\"Excel 12.0;HDR=NO\"";
}
else if (filename.EndsWith(".xls"))
{
connectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + filename + ";Mode=ReadWrite;Extended Properties=\"Excel 8.0;HDR=NO;\"";
}
string SQL = "SELECT COUNT (*) FROM [" + sheetName + "$]";
using (OleDbConnection conn = new OleDbConnection(connectionString))
{
conn.Open();
using (OleDbCommand cmd = new OleDbCommand(SQL, conn))
{
using (OleDbDataReader reader = cmd.ExecuteReader())
{
reader.Read();
count = reader[0].ToString();
}
}
conn.Close();
}
return count;
}
There might be an even faster way of retrieving just the row count, but I know this works.
if you use interop is why don't use UsedRange?
_Worksheet.UsedRange.Rows.Count

Categories

Resources