C# Excel import data from CSV into Excel

C# Excel import data from CSV into Excel - c#

How do I import data in Excel from a CSV file using C#? Actually, what I want to achieve is similar to what we do in Excel, you go to the Data tab and then select From Text option and then use the Text to columns option and select CSV and it does the magic, and all that stuff. I want to automate it.
If you could head me in the right direction, I'll really appreciate that.
EDIT: I guess I didn't explained well. What I want to do is something like
Excel.Application excelApp;
Excel.Workbook excelWorkbook;
// open excel
excelApp = new Excel.Application();
// something like
excelWorkbook.ImportFromTextFile(); // is what I need
I want to import that data into Excel, not my own application. As far as I know, I don't think I would have to parse the CSV myself and then insert them in Excel. Excel does that for us. I simply need to know how to automate that process.

I think you're over complicating things. Excel automatically splits data into columns by comma delimiters if it's a CSV file. So all you should need to do is ensure your extension is CSV.
I just tried opening a file quick in Excel and it works fine. So what you really need is just to call Workbook.Open() with a file with a CSV extension.

You could open Excel, start recording a macro, do what you want, then see what the macro recorded. That should tell you what objects to use and how to use them.

I beleive there are two parts, one is the split operation for the csv that the other responder has already picked up on, which I don't think is essential but I'll include anyways. And the big one is the writing to the excel file, which I was able to get working, but under specific circumstances and it was a pain to accomplish.
CSV is pretty simple, you can do a string.split on a comma seperator if you want. However, this method is horribly broken, albeit I'll admit I've used it myself, mainly because I also have control over the source data, and know that no quotes or escape characters will ever appear. I've included a link to an article on proper csv parsing, however, I have never tested the source or fully audited the code myself. I have used other code by the same author with success. http://www.boyet.com/articles/csvparser.html
The second part is alot more complex, and was a huge pain for me. The approach I took was to use the jet driver to treat the excel file like a database, and then run SQL queries against it. There are a few limitations, which may cause this to not fit you're goal. I was looking to use prebuilt excel file templates to basically display data and some preset functions and graphs. To accomplish this I have several tabs of report data, and one tab which is raw_data. My program writes to the raw_data tab, and all the other tabs calculations point to cells in this table. I'll go into some of the reasoning for this behavior after the code:
First off, the imports (not all may be required, this is pulled from a larger class file and I didn't properly comment what was for what):
using System.IO;
using System.Diagnostics;
using System.Data.Common;
using System.Globalization;
Next we need to define the connection string, my class already has a FileInfo reference at this point to the file I want to use, so that's what I pass on. It's possible to search on google what all the parameters are for, but basicaly use the Jet Driver (should be available on ANY windows install) to open an excel file like you're referring to a database.
string connectString = #"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={filename};Extended Properties=""Excel 8.0;HDR=YES;IMEX=0""";
connectString = connectString.Replace("{filename}", fi.FullName);
Now let's open up the connection to the DB, and be ready to run commands on the DB:
DbProviderFactory factory = DbProviderFactories.GetFactory("System.Data.OleDb");
using (DbConnection connection = factory.CreateConnection())
{
connection.ConnectionString = connectString;
using (DbCommand command = connection.CreateCommand())
{
connection.Open();
Next we need the actual logic for DB insertion. So basically throw queries into a loop or whatever you're logic is, and insert the data row-by-row.
string query = "INSERT INTO [raw_aaa$] (correlationid, ipaddr, somenum) VALUES (\"abcdef", \"1.1.1.1", 10)";
command.CommandText = query;
command.ExecuteNonQuery();
Now here's the really annoying part, the excel driver tries to detect you're column type before insert, so even if you pass a proper integer value, if excel thinks the column type is text, it will insert all you're numbers as text, and it's very hard to get this treated like a number. As such, excel must already have the column type as the number. In order to accomplish this, for my template file I fill in the first 10 rows with dummy data, so that when you load the file in the jet driver, it can detect the proper types and use them. Then all my forumals that point at my csv table will operate properly since the values are of the right type. This may work for you if you're goals are similar to mine, and to use templates that already point to this data (just start at row 10 instead of row 2).
Because of this, my raw_aaa tab in excel might look something like this:
correlationid ipaddr somenum
abcdef 1.1.1.1 5
abcdef 1.1.1.1 5
abcdef 1.1.1.1 5
abcdef 1.1.1.1 5
abcdef 1.1.1.1 5
abcdef 1.1.1.1 5
abcdef 1.1.1.1 5
abcdef 1.1.1.1 5
Note row 1 is the column names that I referenced in my sql queries. I think you can do without this, but that will require a little more research. By already having this data in the excel file, the somenum column will be detected as a number, and any data inserted will be properly treated as such.
Antoher note that makes this annoying, the Jet Driver is 32-bit only, so in my case where I had an explicit 64-bit program, I was unable to execute this directly. So I had the nasty hack of writing to a file, then launch a program that would insert the data in the file into my excel template.
All in all, I think the solution is pretty nasty, but thus far haven't found a better way to do this unfortunatly. Good luck!

You can take a look at TakeIo.Spreadsheet .NET library. It accepts files from Excel 97-2003, Excel 2007 and newer, and CSV format (semicolon or comma separators).
Example:
var inputFile = new FileInfo("Book1.csv"); // could be .xls or .xlsx too
var sheet = Spreadsheet.Read(inputFile);
foreach (var row in sheet)
{
foreach (var cell in row)
{
// do something
}
}
You can remove beginning and trailing empty rows, and also beginning and trailing columns from the imported data using the Normalize() function:
sheet.Normalize();
Sometimes you can find that your imported data contains empty rows between data, so you can use another helper for this case:
sheet.RemoveEmptyRows();
There is a Serialize() function to convert any input to CSV too:
var outfile = new StreamWriter("AllData.csv");
sheet.Serialize(outfile);
If you like to use comma instead of the default semicolon separator in your CSV file, do:
sheet.Serialize(outfile, ',');
And yes, there is also a ToString() function too...
This package is available at NuGet too, just take a look at TakeIo.Spreadsheet.

You can use ADO.NET
http://vbadud.blogspot.com/2008/09/opening-comma-separate-file-csv-through.html

Well, importing from CSV shouldn't be a big deal. I think the most basic method would be to do it using string operations. You could build a pretty fine parser using simple Split() command, and getting the stuff in arrays.

Related

C# reading excel and still getting data from blank cells

I have a tool I made for work. Every week there are 5-20 files for a certain process that fails and I have to find their job ids and rerun them.
I made a tool in C# that takes the names of the failed files in an Excel spreadsheet (we'll call it the Failed File Spreadsheet, or FFS if you're feeling cynical) and then cross references them with a different Excel spreadsheet that has the job ids, and displays the result in the terminal. It reads the FFS this with a fairly simple OledDbDataAdapter code:
public static DataTable GetDataFromExcel(string filename, string sheetName)
{
using(var oledb = new OleDbConnection(CONN_STR.Replace("<FILENAME>", filename).Replace("<HDR>", "no"))
{
var result = new DataSet();
new OleDbDataAdapter($"SELECT * FROM [{sheetName}]", oledb).Fill(result);
return result.Tables[0];
}
}
The tool works fine, mostly. It cross references with another excel sheet and I get my job ids and I can carry on with my task.
However there's one slight issue, and that is that, often when running the tool, when it reads from the FFS, sometimes it returns blank lines. Like if last week I had 7 files, then this week I erased those, pasted in 5 files, then my tool will show the job ids for those 5 files just fine, but also show two blanks, as if it's still reading those two extra rows from the previous week. If however I make a new blank spreadsheet in Excel, plug in my failed files and overwrite the save file, I don't have this issue at all, making me think this is an Excel issue and not a C# coding issue.
Is there a reason why, if I delete the contents of a cell, the OleDbDataAdapter would still be reading those cells? Like are there whitespace characters or other hidden characters still present after deleting contents? I mean I could fix it in the code and just say "don't write it out if the values are whitespace or null" but I want to know why blank cells are even being read at all.
This is just a minor bug and it's not stopping me from doing my work and this tool is nothing more than a personal tool to help with a weekly task. But I'd still like to know why cells that had content, but then had that content deleted, are still being read.

Excel is a little bit quirky like that. If you are manually editing your "Failed File Spreadsheet" (FFS) and as you say, you are pasting 5 rows over the existing 7 rows, then you may still read in those extra rows after the data you expect, if there is any formatting on the cells. To avoid this, in Excel select the range of cells of the whole sheet and right-click and select "Clear Contents".
To be fair, as you alluded to, I think it would be simpler just to fix it in code and skip rows in the DataTable that are empty. Or there is a SO post here which shows how to remove empty rows from a DataTable

I Help and Advice on using Oledb for large excel Files

So I am new to Oledb and I a have project that requires me to grab data from an excel file using a Console Application. The excel file has around 500 columns and 55 rows. How could I possibly get data from columns past 255?

In order to read columns 256 -> you just need to modify the Select statement. By default both the Microsoft.ACE.OLEDB.12.0 and the Microsoft.Jet.OLEDB.4.0 drivers will read from Column 1-255 (A->IU) but you can ask it to read the remaining columns by specifying them on the Select statement.
To read the next 255 columns and taking "Sheet1" as your sheet name you would specify...
Select * From [Sheet1$IV:SP]
This will work even if there aren't another 255 columns. It will simply return the next chunk of columns if there are 1...255 additional columns.
Incidentally, the Microsoft.ACE.OLEDB.12.0 driver will read both .xls and any variant of .xlsx, .xlsm etc without changing the extended properties from "Excel 12.0". There is no need to if...then...else the connection string depending on the file type.
The OLEDB driver is pretty good for the most part but it really does rely on well formed sheets. Mixed data types aren't handled terribly well and it does weird things if the first columns/rows are empty but aside from that it's fine. I use it a lot.

How to import a csv into c-tree

I am using ctreeACE to create a local database, and I was given a csv file that contains 1000 entries of data and wanted to know if there was a way to import it without hard coding it?
Right now I am having to insert line by line with:
INSERT INTO testdata VALUES
('1ZE83A545192635139','2018-06-19 00:00:00',etc)
Note that ctreeACE only allows single row inserts with INSERT...VALUES (Source)

I can't find a way to do this directly, but you could use this tool to create your insert statements.
First input your data. You can load the csv directly, I just hardcoded two sample lines:
Next set your input options as needed. I used comma separators and ' as a quoting character in the example:
Third, set your output options. This would be a huge screenshot and is pretty self-explanatory so I'm leaving it out.
Last, click CSV to SQL Insert, and it will generate formatted insert statements (one line per insert) for you:
Hope that helps.

c# Excel import query

I've been requested to import an excel spreadsheet which is fine but Im getting a problem with importing a certain cell that contains both numeric and alphanumeric characters.
excel eg
Col
A B C
Row 0123 8 Fake Address CF11 1XX
XX123 8 Fake Address CF11 1XX
As per the example above when the dataset is being loaded its treating Row 2, col (A) as a numeric field resulting in an empty column in the array.
My connection for the OleDb is
var dbImportConn = new OleDbConnection(#"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + dataSource
+ #";Extended Properties=""Excel 8.0;HDR=No;IMEX=1"";")
In this connection i have set the IMEX = 1 which should parse all contents as string into the dataset. Also if i change Row 1 Col (A) to have 'XX123' the entire Col (A) successfully parses as string! Unfortunately this is not going to help my scenario as the excel file is passed from an external client who have also advised that do not have the means to pass through the file with a header row which would solve my issue.
My one thought at this point is when I receive the file to edit the file (programmatically) to insert a header but again as the client may change how many columns are contained this would not be a safe option for me.
So basically I need to find a solution for dealing with the current format on the spreadsheet and to pass through all cells into the array. Has anyone come across this issue before ? Or know how to solve this ?
I await your thoughts
Thanks
Scott
ps If this is not clear just shout

Hi There is a registry setting called TypeGuessRows that you can change that tells Excel to scan all the column before deciding it's type. Currently, it seems, this is set to read an x number of rows in a column and decides the type of the column e.g. if your first x rows are integers and x+1 is string, the import will fail because it has already decided that this is an integer column. You can change the registry setting to read the whole column before deciding..
please see this also
http://jingyangli.wordpress.com/2009/02/13/imex1-revisit-and-typeguessrows-setting-change-to-0-watch-for-performance/

This isn't a direct answer, but I would like to recommend you use the Excel Data Reader, which is opensource under the LGPL licence and is Lightweight and fast library written in C# for reading Microsoft Excel files ('97-2007).

C#: Reading data from an xls document

I am currently working on a project for traversing an excel document and inserting data into a database using C#.
The relevant data for this project is:
The excel sheet has 14 rows at the top that I do not care about. (sometimes 15, see Russia/Siberia below)
The data is grouped by name into 2 columns (date and value), such as:
Sheet 1
USA China Russia
Date Value Date Value Siberia
1/1/09 4.3654 1/1/09 2.7456 Date Value
1/2/09 3.5545 1/3/09 9.3214 2/5/09 0.2454
1/3/09 3.2322 1/21/09 5.2234 2/6/09 0.5557
The name I need to acquire is whichever is listed directly above "Date".
I only care about data from dates we do not have in the database. Before each column set is parsed, I will acquire the max date for any given name from the database, and skip anything at or before it.
There is no guarantee that the columns will be in a constant order or have constant spacing.
I do not want data for all names, rather only those in a list I put together before the file is acquired.
My current plan is this:
For each column, if the date field is at row 16, save the name as the value in row 15 above it, check the database for the last date for that name, only insert data where the date is greater than the acquired date.
If the date field is at row 17, do the same thing, but start the for loop through each row at 18.
If the name is not in the list, skip the column. If it is, make sure to grab the column next to it for the necessary values.
My problem is:
I am currently trying to use the ExcelDataReader from Codeplex(http://www.codeplex.com/ExcelDataReader). This only likes csv-like sheets, which this project has not.
I do not know of any alternative Excel readers.
To the best of my knowledge, a straight FileStream traversal of this file can only go row-by-row, rather than column-by-column.
To anyone still reading, thank you for your time. Any recommendations on how to proceed? Please ensure that solutions can traverse each column, not each row.
Also, please don't worry about the database stuff, or the list of names that precedes the traversal.
Addendum: What I'd really like to end up with is some type of table that I can just traverse with a nested loop, making column-centric traversal much, much easier. Because there is so much garbage near the top of the sheet (14+ rows), most simple solutions are not feasible.

If you want to read from excel in C#, i've used this library with great success, it'll give you the flexibility to parse columns/rows just however you'd like:
http://sourceforge.net/projects/koogra/ (read-only)
Other open source libraries i haven't used but could be good:
http://nexcel.sourceforge.net/ (read-only)
http://npoi.codeplex.com/ (can read and write)
http://developer.novell.com/wiki/index.php/Poi.Net (this project is dead)
Alternatively, you can use one of the many good Java libraries, and convert it into a C# assembly using IKVM:
http://jxls.sourceforge.net/
http://www.andykhan.com/jexcelapi/
http://poi.apache.org/ (this one's the grand-daddy of java XLS libraries)
I've covered how to do the IKVM Java -> C# conversion here (it's really not as horrible an option as you think):
http://splinter.com.au/blog/?p=207

Not a straight answer to your question but an alternative idea:
Your data looks like a pivot-ish table. I'd recommend "unpivoting" it into simple table.
Example:
Russia USA
Q1 123 323
Q2 456 321
Q3 567 843
Becomes:
Quarter Country Value
Q1 Russia 123
Q1 USA 323
Q2 Russia 321
....
If that is the case, not sure if I got this right in your question, than processing the data using a OleDB driver or whatever CSV kind of stuff should be become much less painful.

You can access Excel directly using ADO.NET via the ODBC driver. See http://www.davidhayden.com/blog/dave/archive/2006/05/26/2973.aspx or Google for more info on how to do that. You may wish to try HDR=No in your connection string, since your first row isn't really proper headers by the looks of it.
I haven't done this for a while, but I remember that it is a bit "temperamental" and takes some playing around with to get the column names right, but it should work. Try SELECT * FROM [Sheet1$] and see what you get.

I highly recommend saving this Excel document in a CSV format before doing anything else with it. You can do using this code
After you have a CSV, you can either parse it using that library, or write your own parser for it.

As I did before, I prefer to use OLEDB connection in order to connect to an Excel document.
By the way, you can take a look at the following article for more information:
http://www.codeproject.com/KB/office/excel_using_oledb.aspx

SpreadsheetGear for .NET can load workbooks and access any cells on any sheet in any order. You can get the formatted text of the cell (such as "1/1/09") or the underlying value ("1/1/09" is stored as the double 39814.0 in Excel or SpreadsheetGear).
You can see some live ASP.NET samples here and download the free trial here if you want to try it yourself.
Disclaimer: I own SpreadsheetGear LLC

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.