cannot read all the columns from csv using oledb

cannot read all the columns from csv using oledb - c#

I have a csv file with the following header:
"Pickup Date","Pickup Time","Pickup Address","From Zone", and so on..
I can only read the first 2 columns and nothing beyond using oledb. I used a schema.ini file with all column names specified. Pls suggest.
Here is my sample csv.
"PickupDate","PickupTime","PickupAddress","FromZone"
"11/05/15","4:00:00 AM","9 Houston Rd, CityName, NC 28262,","262"
Here is my code:
Schema.ini
-----------
[ReportResults.csv]
ColNameHeader = True
Format = CSVDelimited
col1=Pickup Date DateTime
col2=Pickup Time Text width 100
col3=Pickup Address Text width 500
col4=FromZone short
oledb code
-----------
public static DataTable SelectCSV(string path, string query)
{
// since the file contains addresses with , the delimiter ", is used. Each cell is written within "" in the file.
var strConn = #"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + path +
"; Extended Properties='text;HDR=Yes;FMT=Delimited(\",)'";
OleDbConnection selectConnection = (OleDbConnection)null;
OleDbDataAdapter oleDbDataAdapter = (OleDbDataAdapter)null;
selectConnection = new OleDbConnection(strConn);
selectConnection.Open();
using(OleDbCommand cmd=new OleDbCommand(query,selectConnection))
using (oleDbDataAdapter = new OleDbDataAdapter(cmd))
{
DataTable dt = new DataTable();
dt.Locale=CultureInfo.CurrentCulture;
oleDbDataAdapter.Fill(dt);
return dt;
}
}

Every column is contained in double quotes so every comma inside a double quote is not considered as delimeter.
So you can import your file:
without using schema.ini
specifying EXTENDED PROPERTIES='text;HDR=Yes;FMT=Delimited' in your connection string
If you need to use a schema to solve other problems please note that your schema.ini is not formally correct; use something like this:
[ReportResults.csv]
ColNameHeader = True
Format = CSVDelimited
col1=PickupDate DateTime
col2=PickupTime Text width 100
col3=PickupAddress Text width 500
col4=FromZone short
If you have problem extracting DateTime column specify DateTimeFormat options; i.e. if your pickup date is something like 2015/11/13 specify DateTimeFormat=yyyy/MM/dd=yyyy/MM/dd.
If you have problem extracting Short column verify that FromZone is an integer between -32768 and 32767; if not, use a different type. You can also set DecimalSymbol option if you have problem with decimal separators.
You can find more info on MSDN.

Related

C# syntax help, date formatting and adding " to strings

This is my first foray into C# as a SSIS and Informatica developer living only in SQL. I have a script task that is reading data from a single SQL Server table via Query and simply writing that data to a text file. Everything works except what I think are two small formatting problems I can't figure out.
The following requirements are in place for this build. Thanks in advance I'm here to answer any questions!
SQL query is purposefully set as a Select * to pick up any new columns added(already in code)
First 3 columns excluded from write to file(already in code)
Problems:
" " wrappers need to be added to all values, column and rows.
Date in database is true Date but when writing to file it shows Datetime. Needs to be only date.
Current:
ID
Name
Date
Ratio
12345678
John Wayne
12/31/2018 12:00:00 AM
1/1
Needs to be:
"ID"
"Name"
"Date"
"Ratio"
"12345678"
"John Wayne"
"2018-12-31"
"1/1"
Code:
// Declare Variables
string DestinationFolder = Dts.Variables["User::Target_FilePath"].Value.ToString();
string QueryStage = Dts.Variables["User::Query_Stage"].Value.ToString();
//string TableName = Dts.Variables["User::TableName"].Value.ToString();
string FileName = Dts.Variables["User::OutputFileName"].Value.ToString();
string FileDelimiter = Dts.Variables["User::Target_FileDelim"].Value.ToString();
//string FileExtension = Dts.Variables["User::AC_Prefix"].Value.ToString();
//USE ADO.NET Connection from SSIS Package to get data from table
SqlConnection myADONETConnection = new SqlConnection();
myADONETConnection = (SqlConnection)(Dts.Connections["ADO_TEST_CONN"].AcquireConnection(Dts.Transaction) as SqlConnection);
// Read data from table or view to data table
string query = QueryStage;
SqlCommand cmd = new SqlCommand(query, myADONETConnection);
//myADONETConnection.Open();
DataTable d_table = new DataTable();
d_table.Load(cmd.ExecuteReader());
myADONETConnection.Close();
string FileFullPath = DestinationFolder + "\\" + FileName + ".txt";
StreamWriter sw = null;
sw = new StreamWriter(FileFullPath, false);
// Write the Header Row to File
int ColumnCount = d_table.Columns.Count;
for (int ic = 4; ic < ColumnCount; ic++)
{
sw.Write(d_table.Columns[ic]);
if (ic < ColumnCount - 1)
{
sw.Write(FileDelimiter);
}
}
sw.Write(sw.NewLine);
// Write All Rows to the File
foreach (DataRow dr in d_table.Rows)
{
for (int ir = 4; ir < ColumnCount; ir++)
{
if (!Convert.IsDBNull(dr[ir]))
{
sw.Write(dr[ir].ToString());
}
if (ir < ColumnCount - 1)
{
sw.Write(FileDelimiter);
}
}
sw.Write(sw.NewLine);
}
sw.Close();
Dts.TaskResult = (int)ScriptResults.Success;

Blindly running .ToString() on an object, which is done in the line sw.Write(dr[ir].ToString());, is going to use the default settings of converting that data type into a string. If it's a DateTime (the c# data type, not the SQL Date column type), then it will include the time information.
C# converts SQL column types (such as Date) into C# data types (DateTime). You need to detect this, just as you're detecting if a value is DBNull.
if (!Convert.IsDBNull(dr[ir]))
{
if (dr[ir] is DateTime dt)
{
// use DateTime's specific string rendering
sw.Write(dt.ToString("d"));
}
else
{
// fall back to standard string rendering
sw.Write(dr[ir].ToString());
}
}
You can change out the format ("d" in this case) to be something else if you need a different format. Keep in mind that the Culture of a computer will affect how the string is rendered, unless you explicitly use a named Culture.
The other thing in your problem is adding quotes around printed values. This can be done with string concatination. For example:
string result = "\"" + "my string" + "\"";
// result is "my string", with quotes
Remember to escape the quote mark.

OdbcConnection Text Driver ignores scheme.ini settings

Here is my code:
OdbcConnection conn = new OdbcConnection("Driver={Microsoft Text Driver (*.txt; *.csv)};DSN=scrapped.csv");
conn.Open();
OdbcCommand foo = new OdbcCommand(#"SELECT * FROM [scrapped.csv] WHERE KWOTA < 100.00", conn);
IDataReader dr = foo.ExecuteReader();
StreamWriter asd = new StreamWriter("outfile.txt");
while (dr.Read())
{
int cols = dr.GetSchemaTable().Rows.Count;
for (int i = 0; i < cols; i++)
{
asd.Write(string.Format("{0};",dr[i].ToString()));
}
asd.WriteLine();
}
asd.Flush();
asd.Close();
dr.Close();
conn.Close();
Here is my Scheme.ini
[scrapped.csv]
Format=Delimited(;)
NumberDigits=2
CurrencyThousandSymbol=
CurrencyDecimalSymbol=,
CurrencyDigits=2
Col1=DataOperacji Date
Col2=DataKsiegowania Date
Col3=OpisOperacji Text
Col4=Tytul Text
Col5=NadawcaOdbiorca Text
Col6=NumerKonta Text
Col7=Kwota Currency
Col8=SaldoPoOperacji Currency
Here I have sample from my CSV:
2013-01-22;2013-08-24;notmatter;"notmatter";"notmatter";'notmatter';7 111,55;10 222,20;
2013-03-26;2013-08-23;notmatter;"notmatter";"notmatter";'notmatter';-275,00;15 466,24;
So even if I have date and currency set in scheme.ini and regional settings (which should be used by odbc by defult but are not) values which i write to output file are total mess.
They are empty if there is space (my local thousend delimiter) and if I have value like 15,45 i got 15,4500 instead.
Date fields also behave abnormal, and even if I insert to scheme.ini DateTimeFormat I get nothing like I specified in format.
Any help would be appreciated, what to do with it, I would like to use ODBC and query CSV data like database with WHERE something = something

I added a line to your schema.ini and ran against an adodb connection and it worked for me in the matter of dates, other bits are still not right. Note DateTimeFormat.
[scrapped.csv]
Format=Delimited(;)
NumberDigits=2
CurrencyThousandSymbol=
CurrencyDecimalSymbol=,
CurrencyDigits=2
DateTimeFormat="yyyy-mm-dd"
Col1=DataOperacji Date
Col2=DataKsiegowania Date
Col3=OpisOperacji Text
Col4=Tytul Text
Col5=NadawcaOdbiorca Text
Col6=NumerKonta Text
Col7=Kwota Currency
Col8=SaldoPoOperacji Currency
You may also need:
ColNameHeader=False
MaxScanRows=0
But at the moment, I cannot see a way to get a space accepted as the CurrencyThousandSymbol

CSV is actually .... Semicolon Separated Values ... (Excel export on AZERTY)

I'm a bit confused here.
When I use Excel 2003 to export a sheet to CSV, it actually uses semicolons ...
Col1;Col2;Col3
shfdh;dfhdsfhd;fdhsdfh
dgsgsd;hdfhd;hdsfhdfsh
Now when I read the csv using Microsoft drivers, it expects comma's and sees the list as one big column ???
I suspect Excel is exporting with semicolons because I have a AZERTY keyboard. However, doesn't the CSV reader then also have to take in account the different delimiter ?
How can I know the appropriate delimiter, and/or read the csv properly ??
public static DataSet ReadCsv(string fileName)
{
DataSet ds = new DataSet();
string pathName = System.IO.Path.GetDirectoryName(fileName);
string file = System.IO.Path.GetFileName(fileName);
OleDbConnection excelConnection = new OleDbConnection
(#"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + pathName + ";Extended Properties=Text;");
try
{
OleDbCommand excelCommand = new OleDbCommand(#"SELECT * FROM " + file, excelConnection);
OleDbDataAdapter excelAdapter = new OleDbDataAdapter(excelCommand);
excelConnection.Open();
excelAdapter.Fill(ds);
}
catch (Exception exc)
{
throw exc;
}
finally
{
if(excelConnection.State != ConnectionState.Closed )
excelConnection.Close();
}
return ds;
}

One way would be to just use a decent CSV library; one that lets you specify the delimiter:
using (var csvReader = new CsvReader("yourinputfile.csv"))
{
csvReader.ValueSeparator = ';';
csvReader.ReadHeaderRecord();
while (csvReader.HasMoreRecords)
{
var record = csvReader.ReadDataRecord():
var col1 = record["Col1"];
var col2 = record["Col2"];
}
}

Check what delimiter is specified on your computer. Control Panel > Regional and Language Options > Regional Options tab - click Customize button. There's an option there called "List separator". I suspect this is set to semi-colon.

Solution for German Windows 10:
Mention to change the decimal separator to . and maybe thousands separators to   (thin space) as well.
Can't believe this is true...Comma-separated values are separated by semicolon?

As mentioned by dendarii, the CSV separator that Excel uses is determined by your regional settings, specifically the 'list separator' character.
(And Excel does this erroneously in my opinion, as it is called a comma seperated file)
HOWEVER, if that still does not solve your issue, there is another possible complication:
Check your 'digit grouping' character and ensure that is NOT a comma.
Excel appears to revert back to semicolon when exporting decimal numbers and has digit grouping also set to a comma.
Setting the digit grouping to a full stop / period (.) solved this for me.

Parsing a CSV file problems C#

Having a problem with parsing a CSV file. I connect to the file using the following:
string connString = "Provider=Microsoft.Jet.OLEDB.4.0;"
+ "Data Source=\"" + dir + "\\\";"
+ "Extended Properties=\"text;HDR=No;FMT=Delimited\"";
//create the database query
string query = "SELECT * FROM [" + file + "]";
//create a DataTable to hold the query results
DataTable dTable = new DataTable();
//create an OleDbDataAdapter to execute the query
OleDbDataAdapter dAdapter = new OleDbDataAdapter(query, connString);
//Get the CSV file to change position.
//fill the DataTable
dAdapter.Fill(dTable);
return dTable;
For some reason, the first column reads as a "Header" ok (i.e. HDR=Yes allows the values to be displayed). The problem is when I have HDR=No, nothing after the first 'cell' is displayed in that row. However I need to have HDR=No as I'll be writing the CSV later.
As a quick aside, the rest of the row only has a value in every other column. Also, there is a period in each of these columns. Any help?
Cheers.
EDIT: Here are a fake few lines similar to the CSV:
//Problem row->>
File:,GSK1.D,,GSK2.D,,GSK3.D,
//The following rows, however, are fine:
/ 69,120.3,16.37%,128.9,7.16%,188.92,13.97%
D / 71,48.57,75.50%,32.15,26.65%,58.35,71.43%
T / 89,35.87,45.84%,50.01,28.87%,15.38,43.30%
EDIT: When I put any value into the "blank spaces" above they are parsed, but no matter what I put into the problematic cells (e.g. GSK1.D) they won't parse - unless it is a number! Is there any chance it is automatically converting this cell to a "float" cell? And how can I stop it doing this?

at Codeproject there is an parsing library: http://www.codeproject.com/KB/database/CsvReader.aspx
with an interesting article, how this stuff work. Its working faster (Author), than the OleDB Provider.

I have finished this, just to let anyone know who may have this problem in the future. It turns out the reason there was nothing being taken in was because ADO tries to determine a column type. If other values in this column are not of said type, it removes them completely.
To counter this, you need to create a schema.ini file, like so:
StreamWriter writer = new StreamWriter(File.Create(dir + "\\schema.ini"));
writer.WriteLine("[" + fileToBeRead + "]");
writer.WriteLine("ColNameHeader = False");
writer.WriteLine("Format = CSVDelimited");
writer.WriteLine("CharacterSet=ANSI");
int iColCount = dTable.Columns.Count + 1;
for (int i = 1; i < iColCount; i++)
{
writer.WriteLine("Col" + i + "=Col" + i + "Name Char Width 20");
}
//writer.WriteLine("Col1=Col1Name Char Width 20");
//writer.WriteLine("Col2=Col1Name Char Width 20");
//etc.
writer.Close();
Thanks for everyone's suggestions!

I've seldom done well with database type access to text files - the possibilities for "issues" with the file tend to exceed theoretical time savings.
Personally I've more often than not hand crafted the code to do this. A lot (going back over 20+ years so generic solutions have been thin on the ground). That said, if I were having to process a .csv file now the first thing I'd reach for would be FileHelpers or similar.

Microsoft's Microsoft Text Driver sees text as float operation

Using .NET
I have a text file with comma separated data. One of the columns consists of text like the following : 1997/020269/07
Now when I do a select with an OdbcCommand the string is seen as a float and it returns the 'answer' instead of the actual text!
How can I get the actual text? Am I going to be forced to parsing the file manually?
Hope someone can help...please?! :)
Edit: Some code maybe? :)
string strConnString =
#"Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=" + _FilePath +
#"; Extensions=asc,csv,tab,txt;Persist Security Info=False";
var conn = new System.Data.Odbc.OdbcConnection(strConnString);
var cmd = new System.Data.Odbc.OdbcCommand("select MyColumn from TextFile.txt", conn);
var reader = cmd.ExecuteReader();
while (reader.Read())
{ Console.WriteLine(reader["MyColumn"]); }
This returns 0.014074977 instead of 1997/020269/07

Have you tried using a schema.ini file -- these can be used to explicitly define the format of the text file, including data types.
Your schema.ini file might end up looking a little like:
[sourcefilename.txt]
ColNameHeader=true
Format=CSVDelimited
Col1=MyColumn Text Width 14
Col2=...

Try using schema.ini
[yourfile.txt]
ColNameHeader=false
MaxScanRows=0
Format=FixedLength
Col1=MyColumn Text Width 20
Bye.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

cannot read all the columns from csv using oledb - c#

Related

C# syntax help, date formatting and adding " to strings

OdbcConnection Text Driver ignores scheme.ini settings

CSV is actually .... Semicolon Separated Values ... (Excel export on AZERTY)

Parsing a CSV file problems C#

Microsoft's Microsoft Text Driver sees text as float operation

Categories

Resources