Here is my code:
OdbcConnection conn = new OdbcConnection("Driver={Microsoft Text Driver (*.txt; *.csv)};DSN=scrapped.csv");
conn.Open();
OdbcCommand foo = new OdbcCommand(#"SELECT * FROM [scrapped.csv] WHERE KWOTA < 100.00", conn);
IDataReader dr = foo.ExecuteReader();
StreamWriter asd = new StreamWriter("outfile.txt");
while (dr.Read())
{
int cols = dr.GetSchemaTable().Rows.Count;
for (int i = 0; i < cols; i++)
{
asd.Write(string.Format("{0};",dr[i].ToString()));
}
asd.WriteLine();
}
asd.Flush();
asd.Close();
dr.Close();
conn.Close();
Here is my Scheme.ini
[scrapped.csv]
Format=Delimited(;)
NumberDigits=2
CurrencyThousandSymbol=
CurrencyDecimalSymbol=,
CurrencyDigits=2
Col1=DataOperacji Date
Col2=DataKsiegowania Date
Col3=OpisOperacji Text
Col4=Tytul Text
Col5=NadawcaOdbiorca Text
Col6=NumerKonta Text
Col7=Kwota Currency
Col8=SaldoPoOperacji Currency
Here I have sample from my CSV:
2013-01-22;2013-08-24;notmatter;"notmatter";"notmatter";'notmatter';7 111,55;10 222,20;
2013-03-26;2013-08-23;notmatter;"notmatter";"notmatter";'notmatter';-275,00;15 466,24;
So even if I have date and currency set in scheme.ini and regional settings (which should be used by odbc by defult but are not) values which i write to output file are total mess.
They are empty if there is space (my local thousend delimiter) and if I have value like 15,45 i got 15,4500 instead.
Date fields also behave abnormal, and even if I insert to scheme.ini DateTimeFormat I get nothing like I specified in format.
Any help would be appreciated, what to do with it, I would like to use ODBC and query CSV data like database with WHERE something = something
I added a line to your schema.ini and ran against an adodb connection and it worked for me in the matter of dates, other bits are still not right. Note DateTimeFormat.
[scrapped.csv]
Format=Delimited(;)
NumberDigits=2
CurrencyThousandSymbol=
CurrencyDecimalSymbol=,
CurrencyDigits=2
DateTimeFormat="yyyy-mm-dd"
Col1=DataOperacji Date
Col2=DataKsiegowania Date
Col3=OpisOperacji Text
Col4=Tytul Text
Col5=NadawcaOdbiorca Text
Col6=NumerKonta Text
Col7=Kwota Currency
Col8=SaldoPoOperacji Currency
You may also need:
ColNameHeader=False
MaxScanRows=0
But at the moment, I cannot see a way to get a space accepted as the CurrencyThousandSymbol
Related
This is my first foray into C# as a SSIS and Informatica developer living only in SQL. I have a script task that is reading data from a single SQL Server table via Query and simply writing that data to a text file. Everything works except what I think are two small formatting problems I can't figure out.
The following requirements are in place for this build. Thanks in advance I'm here to answer any questions!
SQL query is purposefully set as a Select * to pick up any new columns added(already in code)
First 3 columns excluded from write to file(already in code)
Problems:
" " wrappers need to be added to all values, column and rows.
Date in database is true Date but when writing to file it shows Datetime. Needs to be only date.
Current:
ID
Name
Date
Ratio
12345678
John Wayne
12/31/2018 12:00:00 AM
1/1
Needs to be:
"ID"
"Name"
"Date"
"Ratio"
"12345678"
"John Wayne"
"2018-12-31"
"1/1"
Code:
// Declare Variables
string DestinationFolder = Dts.Variables["User::Target_FilePath"].Value.ToString();
string QueryStage = Dts.Variables["User::Query_Stage"].Value.ToString();
//string TableName = Dts.Variables["User::TableName"].Value.ToString();
string FileName = Dts.Variables["User::OutputFileName"].Value.ToString();
string FileDelimiter = Dts.Variables["User::Target_FileDelim"].Value.ToString();
//string FileExtension = Dts.Variables["User::AC_Prefix"].Value.ToString();
//USE ADO.NET Connection from SSIS Package to get data from table
SqlConnection myADONETConnection = new SqlConnection();
myADONETConnection = (SqlConnection)(Dts.Connections["ADO_TEST_CONN"].AcquireConnection(Dts.Transaction) as SqlConnection);
// Read data from table or view to data table
string query = QueryStage;
SqlCommand cmd = new SqlCommand(query, myADONETConnection);
//myADONETConnection.Open();
DataTable d_table = new DataTable();
d_table.Load(cmd.ExecuteReader());
myADONETConnection.Close();
string FileFullPath = DestinationFolder + "\\" + FileName + ".txt";
StreamWriter sw = null;
sw = new StreamWriter(FileFullPath, false);
// Write the Header Row to File
int ColumnCount = d_table.Columns.Count;
for (int ic = 4; ic < ColumnCount; ic++)
{
sw.Write(d_table.Columns[ic]);
if (ic < ColumnCount - 1)
{
sw.Write(FileDelimiter);
}
}
sw.Write(sw.NewLine);
// Write All Rows to the File
foreach (DataRow dr in d_table.Rows)
{
for (int ir = 4; ir < ColumnCount; ir++)
{
if (!Convert.IsDBNull(dr[ir]))
{
sw.Write(dr[ir].ToString());
}
if (ir < ColumnCount - 1)
{
sw.Write(FileDelimiter);
}
}
sw.Write(sw.NewLine);
}
sw.Close();
Dts.TaskResult = (int)ScriptResults.Success;
Blindly running .ToString() on an object, which is done in the line sw.Write(dr[ir].ToString());, is going to use the default settings of converting that data type into a string. If it's a DateTime (the c# data type, not the SQL Date column type), then it will include the time information.
C# converts SQL column types (such as Date) into C# data types (DateTime). You need to detect this, just as you're detecting if a value is DBNull.
if (!Convert.IsDBNull(dr[ir]))
{
if (dr[ir] is DateTime dt)
{
// use DateTime's specific string rendering
sw.Write(dt.ToString("d"));
}
else
{
// fall back to standard string rendering
sw.Write(dr[ir].ToString());
}
}
You can change out the format ("d" in this case) to be something else if you need a different format. Keep in mind that the Culture of a computer will affect how the string is rendered, unless you explicitly use a named Culture.
The other thing in your problem is adding quotes around printed values. This can be done with string concatination. For example:
string result = "\"" + "my string" + "\"";
// result is "my string", with quotes
Remember to escape the quote mark.
I have a csv file with the following header:
"Pickup Date","Pickup Time","Pickup Address","From Zone", and so on..
I can only read the first 2 columns and nothing beyond using oledb. I used a schema.ini file with all column names specified. Pls suggest.
Here is my sample csv.
"PickupDate","PickupTime","PickupAddress","FromZone"
"11/05/15","4:00:00 AM","9 Houston Rd, CityName, NC 28262,","262"
Here is my code:
Schema.ini
-----------
[ReportResults.csv]
ColNameHeader = True
Format = CSVDelimited
col1=Pickup Date DateTime
col2=Pickup Time Text width 100
col3=Pickup Address Text width 500
col4=FromZone short
oledb code
-----------
public static DataTable SelectCSV(string path, string query)
{
// since the file contains addresses with , the delimiter ", is used. Each cell is written within "" in the file.
var strConn = #"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + path +
"; Extended Properties='text;HDR=Yes;FMT=Delimited(\",)'";
OleDbConnection selectConnection = (OleDbConnection)null;
OleDbDataAdapter oleDbDataAdapter = (OleDbDataAdapter)null;
selectConnection = new OleDbConnection(strConn);
selectConnection.Open();
using(OleDbCommand cmd=new OleDbCommand(query,selectConnection))
using (oleDbDataAdapter = new OleDbDataAdapter(cmd))
{
DataTable dt = new DataTable();
dt.Locale=CultureInfo.CurrentCulture;
oleDbDataAdapter.Fill(dt);
return dt;
}
}
Every column is contained in double quotes so every comma inside a double quote is not considered as delimeter.
So you can import your file:
without using schema.ini
specifying EXTENDED PROPERTIES='text;HDR=Yes;FMT=Delimited' in your connection string
If you need to use a schema to solve other problems please note that your schema.ini is not formally correct; use something like this:
[ReportResults.csv]
ColNameHeader = True
Format = CSVDelimited
col1=PickupDate DateTime
col2=PickupTime Text width 100
col3=PickupAddress Text width 500
col4=FromZone short
If you have problem extracting DateTime column specify DateTimeFormat options; i.e. if your pickup date is something like 2015/11/13 specify DateTimeFormat=yyyy/MM/dd=yyyy/MM/dd.
If you have problem extracting Short column verify that FromZone is an integer between -32768 and 32767; if not, use a different type. You can also set DecimalSymbol option if you have problem with decimal separators.
You can find more info on MSDN.
I have an assignment where I need to read a text file and then breakdown each line into columns then I need to insert that into database.
What's the best approach for this? Any help will be appreciate it and if you could provide code will be even better.
This is what I have so far
string filename = Server.MapPath("~/Text_File_4.txt");
StreamReader sr = new StreamReader(filename);
string styl;
string colr;
string sdim;
string size;
string qty;
string line;
string sprice;
string sretail;
while ((line = sr.ReadLine()) != null)
{
styl = line.Substring(0, 6);
colr = line.Substring(6, 2);
sdim = line.Substring(8, 1);
size = line.Substring(14, 3);
qty = line.Substring(19, 5);
sprice = line.Substring(27, 6);
sretail = line.Substring(38, 4);
con.Open();
cmd = new SqlCommand("insert into ststyl00(ststyl, stcolr, stsdim, stszcd, stprq, strprq) values(#ststyl, #stcolr, #stsdim, #stszcd, #stprq, #strprq)", con);
cmd.Parameters.Add("#ststyl", SqlDbType.VarChar, 15).Value = styl;
cmd.Parameters.Add("#stcolr", SqlDbType.VarChar, 3).Value = colr;
cmd.Parameters.Add("#stsdim", SqlDbType.VarChar, 8).Value = sdim;
cmd.Parameters.Add("#stszcd", SqlDbType.VarChar, 3).Value = size;
cmd.Parameters.Add("#stprq", SqlDbType.VarChar, 8).Value = sprice;
cmd.Parameters.Add("#strprq", SqlDbType.VarChar, 8).Value = sretail;
cmd.ExecuteNonQuery();
con.Close();
}
Input is a CSV
If your input files are CSV files, I strongly recommend using the CSV Reader class available at
http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader
Input is Fixed-Width
If your input is fixed-width, just read all of the lines in and parse each individual line into an appropriate structure to store in the database (more on that in a moment).
If you have just a little text to read (perhaps a few megabytes or less), just use
File.ReadAllLines
http://msdn.microsoft.com/en-us/library/s2tte0y1
to read in all of the lines of the file into a string[].
Writing to the DB
You now have a capability to read in the file. Now, you need to write it out to the database. Presumably there is a DB table with a given schema that matches the data in the file. Have a look at ADO.Net to understand how to write to a database and ask specific questions as needed.
http://msdn.microsoft.com/en-us/library/h43ks021(v=vs.100).aspx
This sounds like you have to have the text file with delimiter. The delimiter which separates the data into columns, e.g.
data1, data2, data3, data4
The delimiter could be comma or any other character which is not appearing into regular data. If you have the text file in this format, it would be easy to parse it and push it to database.
The approach could be - You open the file using StreamReader. Read the file line by line i.e. read a line at a time. Split the line into columns through specifying delimiter.
string[] lineData = sr.ReadLine().split('delimiter');
foreach(string colData in lineData)
{
//store data into appropriate collections and push it to database
}
In addition to the other parsing techniques already suggested, you can use the TextFieldParser class (it's in the Microsoft.VisualBasic.FileIO namespace) in conjunction with the ADO.Net code you've already written
I'm a bit confused here.
When I use Excel 2003 to export a sheet to CSV, it actually uses semicolons ...
Col1;Col2;Col3
shfdh;dfhdsfhd;fdhsdfh
dgsgsd;hdfhd;hdsfhdfsh
Now when I read the csv using Microsoft drivers, it expects comma's and sees the list as one big column ???
I suspect Excel is exporting with semicolons because I have a AZERTY keyboard. However, doesn't the CSV reader then also have to take in account the different delimiter ?
How can I know the appropriate delimiter, and/or read the csv properly ??
public static DataSet ReadCsv(string fileName)
{
DataSet ds = new DataSet();
string pathName = System.IO.Path.GetDirectoryName(fileName);
string file = System.IO.Path.GetFileName(fileName);
OleDbConnection excelConnection = new OleDbConnection
(#"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + pathName + ";Extended Properties=Text;");
try
{
OleDbCommand excelCommand = new OleDbCommand(#"SELECT * FROM " + file, excelConnection);
OleDbDataAdapter excelAdapter = new OleDbDataAdapter(excelCommand);
excelConnection.Open();
excelAdapter.Fill(ds);
}
catch (Exception exc)
{
throw exc;
}
finally
{
if(excelConnection.State != ConnectionState.Closed )
excelConnection.Close();
}
return ds;
}
One way would be to just use a decent CSV library; one that lets you specify the delimiter:
using (var csvReader = new CsvReader("yourinputfile.csv"))
{
csvReader.ValueSeparator = ';';
csvReader.ReadHeaderRecord();
while (csvReader.HasMoreRecords)
{
var record = csvReader.ReadDataRecord():
var col1 = record["Col1"];
var col2 = record["Col2"];
}
}
Check what delimiter is specified on your computer. Control Panel > Regional and Language Options > Regional Options tab - click Customize button. There's an option there called "List separator". I suspect this is set to semi-colon.
Solution for German Windows 10:
Mention to change the decimal separator to . and maybe thousands separators to (thin space) as well.
Can't believe this is true...Comma-separated values are separated by semicolon?
As mentioned by dendarii, the CSV separator that Excel uses is determined by your regional settings, specifically the 'list separator' character.
(And Excel does this erroneously in my opinion, as it is called a comma seperated file)
HOWEVER, if that still does not solve your issue, there is another possible complication:
Check your 'digit grouping' character and ensure that is NOT a comma.
Excel appears to revert back to semicolon when exporting decimal numbers and has digit grouping also set to a comma.
Setting the digit grouping to a full stop / period (.) solved this for me.
Using .NET
I have a text file with comma separated data. One of the columns consists of text like the following : 1997/020269/07
Now when I do a select with an OdbcCommand the string is seen as a float and it returns the 'answer' instead of the actual text!
How can I get the actual text? Am I going to be forced to parsing the file manually?
Hope someone can help...please?! :)
Edit: Some code maybe? :)
string strConnString =
#"Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=" + _FilePath +
#"; Extensions=asc,csv,tab,txt;Persist Security Info=False";
var conn = new System.Data.Odbc.OdbcConnection(strConnString);
var cmd = new System.Data.Odbc.OdbcCommand("select MyColumn from TextFile.txt", conn);
var reader = cmd.ExecuteReader();
while (reader.Read())
{ Console.WriteLine(reader["MyColumn"]); }
This returns 0.014074977 instead of 1997/020269/07
Have you tried using a schema.ini file -- these can be used to explicitly define the format of the text file, including data types.
Your schema.ini file might end up looking a little like:
[sourcefilename.txt]
ColNameHeader=true
Format=CSVDelimited
Col1=MyColumn Text Width 14
Col2=...
Try using schema.ini
[yourfile.txt]
ColNameHeader=false
MaxScanRows=0
Format=FixedLength
Col1=MyColumn Text Width 20
Bye.