CSV is actually .... Semicolon Separated Values ... (Excel export on AZERTY) - c#

I'm a bit confused here.
When I use Excel 2003 to export a sheet to CSV, it actually uses semicolons ...
Col1;Col2;Col3
shfdh;dfhdsfhd;fdhsdfh
dgsgsd;hdfhd;hdsfhdfsh
Now when I read the csv using Microsoft drivers, it expects comma's and sees the list as one big column ???
I suspect Excel is exporting with semicolons because I have a AZERTY keyboard. However, doesn't the CSV reader then also have to take in account the different delimiter ?
How can I know the appropriate delimiter, and/or read the csv properly ??
public static DataSet ReadCsv(string fileName)
{
DataSet ds = new DataSet();
string pathName = System.IO.Path.GetDirectoryName(fileName);
string file = System.IO.Path.GetFileName(fileName);
OleDbConnection excelConnection = new OleDbConnection
(#"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + pathName + ";Extended Properties=Text;");
try
{
OleDbCommand excelCommand = new OleDbCommand(#"SELECT * FROM " + file, excelConnection);
OleDbDataAdapter excelAdapter = new OleDbDataAdapter(excelCommand);
excelConnection.Open();
excelAdapter.Fill(ds);
}
catch (Exception exc)
{
throw exc;
}
finally
{
if(excelConnection.State != ConnectionState.Closed )
excelConnection.Close();
}
return ds;
}

One way would be to just use a decent CSV library; one that lets you specify the delimiter:
using (var csvReader = new CsvReader("yourinputfile.csv"))
{
csvReader.ValueSeparator = ';';
csvReader.ReadHeaderRecord();
while (csvReader.HasMoreRecords)
{
var record = csvReader.ReadDataRecord():
var col1 = record["Col1"];
var col2 = record["Col2"];
}
}

Check what delimiter is specified on your computer. Control Panel > Regional and Language Options > Regional Options tab - click Customize button. There's an option there called "List separator". I suspect this is set to semi-colon.

Solution for German Windows 10:
Mention to change the decimal separator to . and maybe thousands separators to   (thin space) as well.
Can't believe this is true...Comma-separated values are separated by semicolon?

As mentioned by dendarii, the CSV separator that Excel uses is determined by your regional settings, specifically the 'list separator' character.
(And Excel does this erroneously in my opinion, as it is called a comma seperated file)
HOWEVER, if that still does not solve your issue, there is another possible complication:
Check your 'digit grouping' character and ensure that is NOT a comma.
Excel appears to revert back to semicolon when exporting decimal numbers and has digit grouping also set to a comma.
Setting the digit grouping to a full stop / period (.) solved this for me.

Related

SQL Insert not considering blank values for the insert in my C# code

I have a nice piece of C# code which allows me to import data into a table with less columns than in the SQL table (as the file format is consistently bad).
My problem comes when I have a blank entry in a column. The values statement does not pickup an empty column from the csv. And so I receive the error
You have more insert columns than values
Here is the query printed to a message box...
As you can see there is nothing for Crew members 4 to 11, below is the file...
Please see my code:
SqlConnection ADO_DB_Connection = new SqlConnection();
ADO_DB_Connection = (SqlConnection)
(Dts.Connections["ADO_DB_Connection"].AcquireConnection(Dts.Transaction) as SqlConnection);
// Inserting data of file into table
int counter = 0;
string line;
string ColumnList = "";
// MessageBox.Show(fileName);
System.IO.StreamReader SourceFile =
new System.IO.StreamReader(fileName);
while ((line = SourceFile.ReadLine()) != null)
{
if (counter == 0)
{
ColumnList = "[" + line.Replace(FileDelimiter, "],[") + "]";
}
else
{
string query = "Insert into " + TableName + " (" + ColumnList + ") ";
query += "VALUES('" + line.Replace(FileDelimiter, "','") + "')";
// MessageBox.Show(query.ToString());
SqlCommand myCommand1 = new SqlCommand(query, ADO_DB_Connection);
myCommand1.ExecuteNonQuery();
}
counter++;
}
If you could advise how to include those fields in the insert that would be great.
Here is the same file but opened with a text editor and not given in picture format...
Date,Flight_Number,Origin,Destination,STD_Local,STA_Local,STD_UTC,STA_UTC,BLOC,AC_Reg,AC_Type,AdultsPAX,ChildrenPAX,InfantsPAX,TotalPAX,AOC,Crew 1,Crew 2,Crew 3,Crew 4,Crew 5,Crew 6,Crew 7,Crew 8,Crew 9,Crew 10,Crew 11
05/11/2022,241,BOG,SCL,15:34,22:47,20:34,02:47,06:13,N726AV,"AIRBUS A-319 ",0,0,0,36,AV,100612,161910,323227
Not touching the potential for sql injection as I'm free handing this code. If this a system generated file (Mainframe extract, dump from Dynamics or LoB app) the probability for sql injection is awfully low.
// Char required
char FileDelimiterChar = FileDelimiter.ToChar()[0];
int columnCount = 0;
while ((line = SourceFile.ReadLine()) != null)
{
if (counter == 0)
{
ColumnList = "[" + line.Replace(FileDelimiterChar, "],[") + "]";
// How many columns in line 1. Assumes no embedded commas
// The following assumes FileDelimiter is of type char
// Add 1 as we will have one fewer delimiters than columns
columnCount = line.Count(x => x == FileDelimiterChar) +1;
}
else
{
string query = "Insert into " + TableName + " (" + ColumnList + ") ";
// HACK: this fails if there are embedded delimiters
int foundDelimiters = line.Count(x => x == FileDelimiter) +1;
// at this point, we know how many delimiters we have
// and how many we should have.
string csv = line.Replace(FileDelimiterChar, "','");
// Pad out the current line with empty strings aka ','
// Note: I may be off by one here
// Probably a classier linq way of doing this or string.Concat approach
for (int index = foundDelimiters; index <= columnCount; index++)
{
csv += "','";
}
query += "VALUES('" + csv + "')";
// MessageBox.Show(query.ToString());
SqlCommand myCommand1 = new SqlCommand(query, ADO_DB_Connection);
myCommand1.ExecuteNonQuery();
}
counter++;
}
Something like that should get you a solid shove in the right direction. The concept is that you need to inspect the first line and see how many columns you should have. Then for each line of data, how many columns do you actually have and then stub in the empty string.
If you change this up to use SqlCommand objects and parameters, the approximate logic is still the same. You'll add all the expected parameters by figuring out columns in the first line and then for each line you will add your values and if you have a short row, you just send the empty string (or dbnull or whatever your system expects).
The big take away IMO is that CSV parsing libraries exist for a reason and there are so many cases not addressed in the above psuedocode that you'll likely want to trash the current approach in favor of a standard parsing library and then while you're at it, address the potential security flaws.
I see your updated comment that you'll take the formatting concerns back to the source party. If they can't address them, I would envision your SSIS package being
Script Task -> Data Flow task.
Script Task is going to wrangle the unruly data into a strict CSV dialect that a Data Flow task can handle. Preprocessing the data into a new file instead of trying to modify the existing in place.
The Data Flow then becomes a chip shot of Flat File Source -> OLE DB Destination
Here's how you can process this file... I would still ask for Json or XML though.
You need two outputs set up. Flight Info (the 1st 16 columns) and Flight Crew (a business key [flight number and date maybe] and CrewID).
Seems to me the problem is how the crew is handled in the CSV.
So basic steps are Read the file, use regex to split it, write out first 16 col to output1 and the rest (with key) to flight crew. And skip the header row on your read.
var lines = System.File.IO.ReadAllLines("filepath");
for(int i =1; i<lines.length; i++)
{
var = new System.Text.RegularExpressions.Regex("new Regex("(?:^|,)(?=[^\"]|(\")?)\"?((?(1)(?:[^\"]|\"\")*|[^,\"]*))\"?(?=,|$)"); //Some code I stole to split quoted CSVs
var m = r.Matches(line[i]); //Gives you all matches in a MatchCollection
//first 16 columns are always correct
OutputBuffer0.AddRow();
OutputBuffer0.Date = m[0].Groups[2].Value;
OutputBuffer0.FlightNumber = m[1].Groups[2].Value;
[And so on until m[15]]
for(int j=16; j<m.Length; j++)
{
OutputBuffer1.AddRow(); //This is a new output that you need to set up
OutputBuffer1.FlightNumber = m[1].Groups[2].Value;
[Keep adding to make a business key here]
OutputBuffer1.CrewID = m[j].Groups[2].Value;
}
}
Be careful as I just typed all this out to give you a general plan without any testing. For example m[0] might actually be m[0].Value and all of the data types will be strings that will need to be converted.
To check out how regex processes your rows, please visit https://regex101.com/r/y8Ayag/1 for explanation. You can even paste in your row data.
UPDATE:
I just tested this and it works now. Needed to escape the regex function. And specify that you wanted the value of group 2. Also needed to hit IO in the File.ReadAllLines.
The solution that I implemented in the end avoided the script task completely. Also meaning no SQL Injection possibilities.
I've done a flat file import. Everything into one column then using split_string and a pivot in SQL then inserted into a staging table before tidy up and off into main.
Flat File Import to single column table -> SQL transform -> Load
This also allowed me to iterate through the files better using a foreach loop container.
ELT on this occasion.
Thanks for all the help and guidance.

cannot read all the columns from csv using oledb

I have a csv file with the following header:
"Pickup Date","Pickup Time","Pickup Address","From Zone", and so on..
I can only read the first 2 columns and nothing beyond using oledb. I used a schema.ini file with all column names specified. Pls suggest.
Here is my sample csv.
"PickupDate","PickupTime","PickupAddress","FromZone"
"11/05/15","4:00:00 AM","9 Houston Rd, CityName, NC 28262,","262"
Here is my code:
Schema.ini
-----------
[ReportResults.csv]
ColNameHeader = True
Format = CSVDelimited
col1=Pickup Date DateTime
col2=Pickup Time Text width 100
col3=Pickup Address Text width 500
col4=FromZone short
oledb code
-----------
public static DataTable SelectCSV(string path, string query)
{
// since the file contains addresses with , the delimiter ", is used. Each cell is written within "" in the file.
var strConn = #"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + path +
"; Extended Properties='text;HDR=Yes;FMT=Delimited(\",)'";
OleDbConnection selectConnection = (OleDbConnection)null;
OleDbDataAdapter oleDbDataAdapter = (OleDbDataAdapter)null;
selectConnection = new OleDbConnection(strConn);
selectConnection.Open();
using(OleDbCommand cmd=new OleDbCommand(query,selectConnection))
using (oleDbDataAdapter = new OleDbDataAdapter(cmd))
{
DataTable dt = new DataTable();
dt.Locale=CultureInfo.CurrentCulture;
oleDbDataAdapter.Fill(dt);
return dt;
}
}
Every column is contained in double quotes so every comma inside a double quote is not considered as delimeter.
So you can import your file:
without using schema.ini
specifying EXTENDED PROPERTIES='text;HDR=Yes;FMT=Delimited' in your connection string
If you need to use a schema to solve other problems please note that your schema.ini is not formally correct; use something like this:
[ReportResults.csv]
ColNameHeader = True
Format = CSVDelimited
col1=PickupDate DateTime
col2=PickupTime Text width 100
col3=PickupAddress Text width 500
col4=FromZone short
If you have problem extracting DateTime column specify DateTimeFormat options; i.e. if your pickup date is something like 2015/11/13 specify DateTimeFormat=yyyy/MM/dd=yyyy/MM/dd.
If you have problem extracting Short column verify that FromZone is an integer between -32768 and 32767; if not, use a different type. You can also set DecimalSymbol option if you have problem with decimal separators.
You can find more info on MSDN.

Parsing a CSV file problems C#

Having a problem with parsing a CSV file. I connect to the file using the following:
string connString = "Provider=Microsoft.Jet.OLEDB.4.0;"
+ "Data Source=\"" + dir + "\\\";"
+ "Extended Properties=\"text;HDR=No;FMT=Delimited\"";
//create the database query
string query = "SELECT * FROM [" + file + "]";
//create a DataTable to hold the query results
DataTable dTable = new DataTable();
//create an OleDbDataAdapter to execute the query
OleDbDataAdapter dAdapter = new OleDbDataAdapter(query, connString);
//Get the CSV file to change position.
//fill the DataTable
dAdapter.Fill(dTable);
return dTable;
For some reason, the first column reads as a "Header" ok (i.e. HDR=Yes allows the values to be displayed). The problem is when I have HDR=No, nothing after the first 'cell' is displayed in that row. However I need to have HDR=No as I'll be writing the CSV later.
As a quick aside, the rest of the row only has a value in every other column. Also, there is a period in each of these columns. Any help?
Cheers.
EDIT: Here are a fake few lines similar to the CSV:
//Problem row->>
File:,GSK1.D,,GSK2.D,,GSK3.D,
//The following rows, however, are fine:
/ 69,120.3,16.37%,128.9,7.16%,188.92,13.97%
D / 71,48.57,75.50%,32.15,26.65%,58.35,71.43%
T / 89,35.87,45.84%,50.01,28.87%,15.38,43.30%
EDIT: When I put any value into the "blank spaces" above they are parsed, but no matter what I put into the problematic cells (e.g. GSK1.D) they won't parse - unless it is a number! Is there any chance it is automatically converting this cell to a "float" cell? And how can I stop it doing this?
at Codeproject there is an parsing library: http://www.codeproject.com/KB/database/CsvReader.aspx
with an interesting article, how this stuff work. Its working faster (Author), than the OleDB Provider.
I have finished this, just to let anyone know who may have this problem in the future. It turns out the reason there was nothing being taken in was because ADO tries to determine a column type. If other values in this column are not of said type, it removes them completely.
To counter this, you need to create a schema.ini file, like so:
StreamWriter writer = new StreamWriter(File.Create(dir + "\\schema.ini"));
writer.WriteLine("[" + fileToBeRead + "]");
writer.WriteLine("ColNameHeader = False");
writer.WriteLine("Format = CSVDelimited");
writer.WriteLine("CharacterSet=ANSI");
int iColCount = dTable.Columns.Count + 1;
for (int i = 1; i < iColCount; i++)
{
writer.WriteLine("Col" + i + "=Col" + i + "Name Char Width 20");
}
//writer.WriteLine("Col1=Col1Name Char Width 20");
//writer.WriteLine("Col2=Col1Name Char Width 20");
//etc.
writer.Close();
Thanks for everyone's suggestions!
I've seldom done well with database type access to text files - the possibilities for "issues" with the file tend to exceed theoretical time savings.
Personally I've more often than not hand crafted the code to do this. A lot (going back over 20+ years so generic solutions have been thin on the ground). That said, if I were having to process a .csv file now the first thing I'd reach for would be FileHelpers or similar.

Microsoft's Microsoft Text Driver sees text as float operation

Using .NET
I have a text file with comma separated data. One of the columns consists of text like the following : 1997/020269/07
Now when I do a select with an OdbcCommand the string is seen as a float and it returns the 'answer' instead of the actual text!
How can I get the actual text? Am I going to be forced to parsing the file manually?
Hope someone can help...please?! :)
Edit: Some code maybe? :)
string strConnString =
#"Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=" + _FilePath +
#"; Extensions=asc,csv,tab,txt;Persist Security Info=False";
var conn = new System.Data.Odbc.OdbcConnection(strConnString);
var cmd = new System.Data.Odbc.OdbcCommand("select MyColumn from TextFile.txt", conn);
var reader = cmd.ExecuteReader();
while (reader.Read())
{ Console.WriteLine(reader["MyColumn"]); }
This returns 0.014074977 instead of 1997/020269/07
Have you tried using a schema.ini file -- these can be used to explicitly define the format of the text file, including data types.
Your schema.ini file might end up looking a little like:
[sourcefilename.txt]
ColNameHeader=true
Format=CSVDelimited
Col1=MyColumn Text Width 14
Col2=...
Try using schema.ini
[yourfile.txt]
ColNameHeader=false
MaxScanRows=0
Format=FixedLength
Col1=MyColumn Text Width 20
Bye.

writing long text in excel workbook using interop throws error?

I am writing long text (1K to 2K characters long, plain xml data) into a cell in excel workbook.
The below statement throws COM error Exception from HRESULT: 0x800A03EC
range.set_Value(Type.Missing, data);
If I copy paste the same xml manually into excel it just works fine ,but the same does not work progamatically.
If I strip the text to something like 100/300 chars it works fine.
There is a limit (somehwere between 800 and 900 chars if i remember correctly) that is nearly impossible to get around like this.
Try using an ole connection and inserting the data with an SQL command. That might work better for you. you can then use interop to do any formatting if necessary.
the following KB article explains that the max limit is 911 characters. I checked the same on my code it does work for string upto 911 chars.
http://support.microsoft.com/kb/818808
The work around mentioned in this article recommends to make sure no cell holds more than 911 characters. thats lame!
Good Ole and excel article: http://support.microsoft.com/kb/316934
The following code updates a private variable that is the number of successful rows and returns a string which is the path to the excel file.
Remember to use Path from System.IO;!
string tempXlsFilePathName;
string result = new string;
string sheetName;
string queryString;
int successCounter;
// set sheetName and queryString
sheetName = "sheetName";
queryString = "CREATE TABLE " + sheetName + "([columnTitle] char(255))";
// Write .xls
successCounter = 0;
tempXlsFilePathName = (_tempXlsFilePath + #"\literalFilename.xls");
using (OleDbConnection connection = new OleDbConnection(GetConnectionString(tempXlsFilePathName)))
{
OleDbCommand command = new OleDbCommand(queryString, connection);
connection.Open();
command.ExecuteNonQuery();
yourCollection.ForEach(dataItem=>
{
string SQL = "INSERT INTO [" + sheetName + "$] VALUES ('" + dataItem.ToString() + "')";
OleDbCommand updateCommand = new OleDbCommand(SQL, connection);
updateCommand.ExecuteNonQuery();
successCounter++;
}
);
// update result with successfully written username filepath
result = tempXlsFilePathName;
}
_successfulRowsCount = successCounter;
return result;
N.B. This was edited in a hurry, so may contain some mistakes.
To solve this limitation, only writing/updating one cell at a time and dispose the Excel com object immediately. And recreate the object again for writing/updating the next cell.
I can confirm this solution is working in VS2010 (VB.NET project) with Microsoft Excel 10.0 Object Library (Microsoft Office XP)
This limitation is supposed to have been removed in Excel 2007/2010. Using VBA the following works
Sub longstr()
Dim str1 As String
Dim str2 As String
Dim j As Long
For j = 1 To 2000
str1 = str1 & "a"
Next j
Range("a1:a5").Value2 = str1
str2 = Range("a5").Value2
MsgBox Len(str2)
End Sub
I'll start by saying I haven't tried this myself, but my research says that you can use QueryTables to overcome the 911 character limitation.
This is the primary post I found which talks about using a record set as the data source for a QueryTable and adding it to a spreadsheet: http://www.excelforum.com/showthread.php?t=556493&p=1695670&viewfull=1#post1695670.
Here is some sample C# code of using QueryTables: import txt files using excel interop in C# (QueryTables.Add).

Categories

Resources