Get distinct values from column by column - c#

Have to get each columns distinct data and store to the Dictionary (or array) using Excel.interop. I have tried the following code, but it does not align with Excel.interop.
var excel = new ExcelQueryFactory("worksheetFileName");
var distinctNames = (from row in excel.WorkSheet() select row["ColB"]).Distinct();
Please provide the Excel.Interop snippet/code to get distinct values column by column and store in array.

For this operation it does not make sense to using Excel automation, instead the prudent course of action is to work with OleDb unless there is a sound reason for using Excel automation.
Example, figure 1 is a function to create a connection string which can be used in any project while figure 2 is for reading data.
To work with Excel automation we open ourselves up to objects not being disposed of if there is a crash or that you do not code properly (this I call the two dot rule) when objects can't be released because of how you created and used automation objects which does not happen with OleDb. Now if you wanted formatting than we move to automation.
public string ConnectionString(string FileName, string Header)
{
OleDbConnectionStringBuilder Builder = new OleDbConnectionStringBuilder();
if (System.IO.Path.GetExtension(FileName).ToUpper() == ".XLS")
{
Builder.Provider = "Microsoft.Jet.OLEDB.4.0";
Builder.Add("Extended Properties", string.Format("Excel 8.0;IMEX=1;HDR={0};", Header));
}
else
{
Builder.Provider = "Microsoft.ACE.OLEDB.12.0";
Builder.Add("Extended Properties", string.Format("Excel 12.0;IMEX=1;HDR={0};", Header));
}
Builder.DataSource = FileName;
return Builder.ConnectionString;
}
Code to read the first column in Sheet2 and get distinct values, in this case I am working against a column with dates as string into List where the file resides in the same folder as the app executable
private List<string> DemoDistinct()
{
List<string> dateList = new List<string>();
DataTable dt = new DataTable();
using (OleDbConnection cn = new OleDbConnection { ConnectionString = ConnectionString(System.IO.Path.Combine(Application.StartupPath, "WS1.xlsx"), "Yes") })
{
cn.Open();
using (OleDbCommand cmd = new OleDbCommand
{
CommandText = "SELECT DISTINCT [Dates] FROM [Sheet2$]",
Connection = cn
}
)
{
OleDbDataReader dr = cmd.ExecuteReader();
dt.Load(dr);
dateList = dt
.AsEnumerable()
.Select(row => row.Field<DateTime>("Dates").ToShortDateString()).ToList();
}
}
return dateList;
}

Related

The Microsoft jet database engine could not find object while reading dbf file

I am facing very strange issue. I have written class to which reads dbf file through oledb connection. I have downloaded dbf file from internet and it is reading all data correctly.
DBF file location: E:\Projects\SLAVE.DBF
I am facing following 2 issues
1) When I try to read other dbf file then it is reading only its table fields. it is not reading table fields data.
E:\Projects\line75.dbf
2) The other issue I am facing I have DBF files when I put these files in location then i am getting exception that
microsoft jet database engine does not find required object. Are you
missing some directive or path. E:\Projects\SDW_plnParcel.dbf
I am totally confused why it is reading SLAVE.DBF downloaded from internet correct, why it is not reading TABLE FIELDS DATA of line75.dbf and why it is throwing exception on SDW_plnParcel.dbf.
My class and one function for this class is as follows:
public class dbfHandler
{
public dbfHandler()
{
this.dbfTable = new DataTable();
}
public void initconnection(String filepath) // initialise dbconnection
{
String[] splitString = filepath.Split('\\');
this.filename = splitString[splitString.Length - 1];
splitString = splitString.Where(w => w != splitString[splitString.Length - 1]).ToArray();
String folderPath = String.Join("\\", splitString);
this.dbConnection = new OleDbConnection(#"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + folderPath + ";Extended Properties=dBase III");
this.dbConnection.Open();
}
public List<String> getcolumnvalues(int fieldIndex, List<int> rowIndexes)
{
List<String> columnvalues = new List<string>();
try
{
if(this.dbConnection.State == ConnectionState.Open)
{
string mySQL = "select * from " + this.filename; // dbf table name
OleDbCommand MyQuery = new OleDbCommand(mySQL, this.dbConnection);
OleDbDataReader reader = MyQuery.ExecuteReader();
int rowCount = 0;
while(reader.Read())
{
bool match = rowIndexes.Any(item => item == rowCount);
if(match == true)
{
String value = reader.GetValue(fieldIndex).ToString();
columnvalues.Add(value);
}
rowCount++;
}
reader.Close();
}
}
catch(Exception e)
{
throw e;
}
return columnvalues;
}
private String filename;
private DataTable dbfTable;
private OleDbConnection dbConnection;
}
When dealing with .DBF files, I have always had better results working with Microsoft's Visual Foxpro OleDb Provider
The connection string in simplified format
var connString = #"Provider=VFPOLEDB.1;Data Source=C:\SomePathToData;";
Now, instead of doing the data reader -- just to make sure you can get / see what you are expecting, try using a DataAdapter...
var da = new OleDataAdapter( yourSqlCmdObject, yourConnection)
var dt = new DataTable();
da.Fill(dt);
It should pull all columns from your query and all rows into proper data column types... Then you could cycle through all the column names, rows, etc..
foreach( DataColumn dc in dt.Columns )
var tmp = dc.ColumnName;
foreach( DataRow dr in dt.Rows )
{
object x = dr[0]; // get VALUE from column 0
x = dr["SpecificColumn"]; // if you KNOW the column name
}
Of which, you could tweak as needed. But if you only need a SPECIFIC column (or limited columns), change your query to quantify that.
Select OneField from YourTable...

I can not read an Excel cell having a leading apostrophe within it

I faced such a problem. I trying to read Excel file data, all are as a string. I used code bellow.
try
{
var connectionString = string.Format( "Provider=Microsoft.Jet.OLEDB.4.0; data source={0}; Extended Properties=\"Excel 8.0;HDR=YES;IMEX=1\"", session["FilePath"] );
using (var adapter = new System.Data.OleDb.OleDbDataAdapter( "SELECT * FROM [Sheet1$]", connectionString ))
{
var ds = new DataSet();
adapter.Fill( ds, "workBook" );
workBook = ds.Tables["workBook"];
}
if (workBook == null)
throw new Exception( "Could not load imported spreadsheet!" );
if (workBook.Rows.Count <= 0)
throw new Exception( "You are use an empty spreadsheet!" );
foreach (DataColumn column in workBook.Columns)
column.ColumnName = column.ColumnName.Trim();
}
catch (Exception exc)
{
}
All worked fine, I was getting a datatable with data as a string data type and was parsing them on program level (I just have a mixed data types in one column). But when the cell have a Number format and value of this cell, for example, is 0589, I need to add a leading apostrophe in a cell because 0 must be present in 4-digit number. When I tried read such excel file using a IMEX parameter 1, I have got NULL value from this cell. I don't understand why, I read all data as a string data type.
Change the number format of the cells to "0000" for a number that will always be 4 digits and retain the leading zeros.
As I fixed that, before loading spreadsheet I set the registry key TypeGuessRows to zero on the program level and after loading back to 8 (in case other programs will use it).
string file = "C:\\temp\\Exposure\\UTC.xlsx";
OleDbConnectionStringBuilder connStringBuilder = new OleDbConnectionStringBuilder();
connStringBuilder.DataSource = file;
connStringBuilder.Provider = "Microsoft.ACE.OLEDB.12.0";
connStringBuilder.Add("Extended Properties", "Excel 8.0;HDR=NO;IMEX=1");
DbProviderFactory factory = DbProviderFactories.GetFactory("System.Data.OleDb");
DbConnection connection = factory.CreateConnection();
connection.ConnectionString = connStringBuilder.ConnectionString;
connection.Open();
// var myTableName = connection.GetSchema("Tables").Rows[0]["TABLE_NAME"];
DbCommand selectCommand = factory.CreateCommand();
string sql = "SELECT * FROM [Daily Monitoring$]";
selectCommand.CommandText = sql;
selectCommand.Connection = connection;
DbDataAdapter adapter = factory.CreateDataAdapter();
adapter.SelectCommand = selectCommand;
DataSet data = new DataSet();
adapter.Fill(data);
DataTable dt = data.Tables[0];
connection.Close();
string ss = dt.Rows[1][1].ToString();

Reading from Excel File

I have seen many examples of this around but something isn't working for me.
What I am looking to do is to read an Excel sheet, given a sheet and store those values into Lists.
For example, say I have an excel file that looks like:
First Second Third
f1 s1 t1
f2 s2 t2
f3 s3 t3
Each row is to be considered a set of values.
This is what I have doing so far:
List<string> ColumnNames= GetColumnNames();
using (OleDbConnection OleDbConn = new OleDbConnection(Path))
{
OleDbConn.Open();
String cmdString = "SELECT * FROM [" + sheetName+ "]";
OleDbCommand cmd = new OleDbCommand(cmdString, OleDbConn);
DataTable dt = new DataTable();
List<ValueSet> sets = new List<ValueSet>();
Dictionary<string, Value> values = new Dictionary <string,value>()
ValueSet valueset = new ValueSet(null);
using (OleDbDataReader oleRdr = cmd.ExecuteReader())
{
while (oleRdr.Read())
{
for (int i = 0; i < ColumnNames.Count; i++)
{
ColumnName cn = new ColumnName(columnNames[i]);
string data= oleRdr[f.Name].ToString();
Value value = new Value(data, f);
if (!values.ContainsKey(ColumnNames[i]))
{
values.Add(ColumnNames[i], value);
}
else
{
values[ColumnNames[i]] = value;
}
}
valueSet= new ValueSet(values);
sets.Add(valueSet);
}
return sets;;
}
I've gotten weird results with certain files using an OleDbConnection.
I suggest http://www.codeproject.com/Articles/11698/A-Portable-and-Efficient-Generic-Parser-for-Flat-F
With this you can read your CSV into a datatable and parse it into a list as follows:
DataTable dtPrereg;
using (GenericParserAdapter gp = new GenericParserAdapter(Server.MapPath("prereg.csv"), Encoding.UTF8))
{
gp.FirstRowHasHeader = true;
dtPrereg = gp.GetDataTable();
}
I haven't tested this on tab delimited files, but it should work the same (or you could convert your file to CSV)
If you really have a spreadsheet with a known number of named columns and you want to project them into a List<List<string>> it's a lot easier to just do it with Linq.
e.g.
List<List<string>> data;
using (OleDbDataReader rdr = cmd.ExecuteReader())
{
data = (from row in rdr.Cast<DbDataRecord>()
select new List<string>
{
row["First"].ToString(),
row["Second"].ToString(),
row["Third"].ToString()
}).ToList();
}
try changing
ValueSet= new ValueSet(values);
sets.Add(ValueSet);
to
valueset = new ValueSet(values);
sets.Add(valueset );

How can I extract an MDB file's table contents to text in C#?

A project I'm working on contains an MDB (acecss database) file. I'd like to export the contents of the tables to text, but am having a hard time finding a way to do it easily using C#. Is there a faster way than using OLEDB and queries?
Update:
Ideally I'd like to not have to statically name each table (there are hundreds) and I have to use .NET 2.0 or below.
There might be a more efficient way, but you could populate the data into a DataTable, and then export to a text file:
Getting data into the DataTable:
string connString = "Provider=Microsoft.ACE.OLEDB.12.0;data source=C:\\marcelo.accdb";
DataTable results = new DataTable();
using(OleDbConnection conn = new OleDbConnection(connString))
{
OleDbCommand cmd = new OleDbCommand("SELECT * FROM Clientes", conn);
conn.Open();
OleDbDataAdapter adapter = new OleDbDataAdapter(cmd);
adapter.Fill(results);
}
Exporting the DataTable to CSV:
EDIT I haven't tested this, but something like this should work for .NET 2.0.
//initialize the strinbuilder
StringBuilder sb = new StringBuilder();
//append the columns to the header row
string[] columns = new string[dt.Columns.Count - 1];
for (int i = 0; i < dt.Columns.Count; i++)
columns[i] = dt.Columns[i].ColumnName;
sb.AppendLine(string.Join(",", columns));
foreach (DataRow row in dt.Rows)
{
//append the data for each row in the table
string[] fields = new string[row.ItemArray.Length];
for (int x = 0; x < myDataRow.ItemArray.Length; x++)
arr[x] = row[x].ToString();
sb.AppendLine(string.Join(",", fields));
}
File.WriteAllText("test.csv", sb.ToString());
No obvious way comes to mind. Just write something that iterates through the tables and spits out the data in whatever text format you want (.csv, tab delimited, etc).
You could always write it in VBA inside of Access, but I don't know if that would make it faster or slower.
If you want to go the Interop route, you can do it in a single command with the Access TransferText method:
using Access = Microsoft.Office.Interop.Access;
using System.Runtime.InteropServices;
static void ExportToCsv(string databasePath, string tableName, string csvFile) {
Access.Application app = new Access.Application();
app.OpenCurrentDatabase(databasePath);
Access.DoCmd doCmd = app.DoCmd;
doCmd.TransferText(Access.AcTextTransferType.acExportDelim, Type.Missing, tableName, csvFile, true);
app.CloseCurrentDatabase();
Marshal.FinalReleaseComObject(doCmd);
doCmd = null;
app.Quit();
Marshal.FinalReleaseComObject(app);
app = null;
}
I do not know C#, but here is another idea, but quite rough. It uses Microsoft.Office.Interop.Access.Dao
DBEngine dbEng = new DBEngine();
Workspace ws = dbEng.CreateWorkspace("", "admin", "",
WorkspaceTypeEnum.dbUseJet);
Database db = ws.OpenDatabase("z:\\docs\\test.accdb", false, false, "");
foreach (TableDef tdf in db.TableDefs)
{
string tablename=tdf.Name;
if (tablename.Substring(0,4) != "MSys")
{
string sSQL = "SELECT * INTO [Text;FMT=Delimited;HDR=Yes;DATABASE=Z:\\Docs].[out_"
+ tablename + ".csv] FROM " + tablename;
db.Execute(sSQL);
}
}

fetch column names for specific table

I want to fetch all the column names for specific table..
I am using msaccess and C# .net 2008.
You can fetch schema information for a given query through OleDb using the SchemaOnly CommandBehavior and the GetSchemaTable method, as follows:
var conStr = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=NWIND.mdb";
using (var con = new OleDbConnection(conStr))
{
con.Open();
using (var cmd = new OleDbCommand("select * from Suppliers", con))
using (var reader = cmd.ExecuteReader(CommandBehavior.SchemaOnly))
{
var table = reader.GetSchemaTable();
var nameCol = table.Columns["ColumnName"];
foreach (DataRow row in table.Rows)
{
Console.WriteLine(row[nameCol]);
}
}
}
A variant of bubi's method for a specific table:
public List<string> GetTableColumnNames(string tableName)
{
var conStr = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=NWIND.mdb";
using (var connection = new OleDbConnection(conStr))
{
connection.Open();
var schemaTable = connection.GetOleDbSchemaTable(
OleDbSchemaGuid.Columns,
new Object[] { null, null, tableName });
if (schemaTable == null)
return null;
var columnOrdinalForName = schemaTable.Columns["COLUMN_NAME"].Ordinal;
return (from DataRow r in schemaTable.Rows select r.ItemArray[columnOrdinalForName].ToString()).ToList();
}
}
Of course first you might want to check if the table actually exists before getting its column names:
public bool TableExists(string tableName)
{
var conStr = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=NWIND.mdb";
using (var connection = new OleDbConnection(conStr))
{
connection.Open();
var tables = connection.GetSchema("Tables");
var tableExists = false;
for (var i = 0; i < tables.Rows.Count; i++)
{
tableExists = String.Equals(tables.Rows[i][2].ToString(),
tableName,
StringComparison.CurrentCultureIgnoreCase);
if (tableExists)
break;
}
return tableExists;
}
}
This retrieves all the columns of all tables and views
DataTable schemaTable = ((OleDbConnection)jetConnection).GetOleDbSchemaTable(
System.Data.OleDb.OleDbSchemaGuid.Columns,
new object[] { null, null, null, null });
I found this article while trying to build a C# application to migrate an Access database. The database I'm migrating is an Access 2007/2010 file with .accdb extension.
If you use this code on a table that has Memo or Attachment columns (available in accdb files), it will return the type of these columns as string (wchar).
I had trouble finding much information about how to deal with these types of columns, so I wanted to provide a link to the article that helped me figure out how to handle them:
https://social.msdn.microsoft.com/Forums/vstudio/en-US/d15606f9-f38d-4a1b-8ce3-000c558e79c5
I took the bottom example in that thread and converted it to C#. I did have to add this using statement to the module to avoid having to edit all of the references to "AccessDao":
using AccessDao = Microsoft.Office.Interop.Access.Dao;
My apologies for tacking onto an old thread, but I used this thread as a starting point for writing my code and didn't realize this gotcha right away.
Here's code to get the column names in the order they appear in the Access table. The examples in the other answers here return the column names in alphabetical order (at least for me... using the Microsoft Access Database Engine 2016 Redistributable and .NET Core 3.1).
Based on qnaninf's code example:
var schemaTable = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Columns, new object[] { null, null, tableName });
var columnOrdinalForName = schemaTable.Columns["COLUMN_NAME"].Ordinal;
var columnOrdinalForOrdinal = schemaTable.Columns["ORDINAL_POSITION"].Ordinal;
var rows = schemaTable.Rows;
var columns = from DataRow r in schemaTable.Rows
orderby r.ItemArray[columnOrdinalForOrdinal]
select new
{
Ordinal = r.ItemArray[columnOrdinalForOrdinal].ToString(),
ColumnName = r.ItemArray[columnOrdinalForName].ToString()
};
You can get the column names in Vb.net and Oledb from MS access database as follows.
'In Vb.net with OleDb
Dim adapter As new OleDb.OleDbDataAdapter
Dim ds As New DataSet
cmd.CommandText = "select * from table_name where 1=2"
adapter.SelectCommand = cmd
adapter.Fill(ds)
adapter.Dispose()
cmd.Dispose()
For Each dr In ds.Tables(0).Columns
ComboBox1.Items.Add(dr.ToString) 'The Column name will come in this combobox
Next

Categories

Resources