I am combining multiple large excel files with different columns and number of columns.
Before starting to combine, I want to collect all header rows in order to make a data table which having all columns in advance.
I know that there is a method datatable.merge in c#, which allow to add missing column while combining.
Because there are too many big excel files, and the maximum rows per sheet in excel is about 1 millions row. So when reaching limit, I must save part of combining to excel, clear the content and keep combine after that. This will lead to the result that the saving part in the early process will don't have the same schema as the final one.
This is the reason why I must collect all header in advance.
As far as I am concerned, library in c# like Epplus or ExcelDataReader load entire content of excel. This lasts very long. I don't need to load all content at once.
Somebody here know how to load excel header row only ?
Thank you so much.
I have a table in an Excel worksheet where I need to programatically remove entire rows using VSTO. After a lot of searching here and everywhere else, I was unable to find the answer. Due to some unrelated code, I also cannot delete the first row of the table, but need to remove all other rows.
Here are the specific requirements:
One of the functions of this addin is to populate the table. This is done through a loop starting with the "root" named range in the left column of the first row of the table.
Whenever populating the table, I first need to delete all data from the table and then add the new data. I need to use the "root" to add the data, so I can't have it deleted.
I am using the Table for the automated formatting instead of formatting the table manually after adding each cell.
I never know how many rows will be added, but it will always be at least one.
After banging my head on this for a few hours, I slept on it and came at it refreshed this morning. After much trial and error, here is the code I came up with.
var deplTable = ThisSheet.Evaluate("DeploymentTable");
if (deplTable.ListObject.ListRows.Count > 1)
{
do deplTable.ListObject.ListRows[2].Delete();
while (deplTable.ListObject.ListRows.Count > 1);
}
NOTE: ThisSheet is set to the correct sheet earlier. The application works on multiple sheets, so it needs to be flexible.
I tried this a few ways before finally getting it to work. Looping through the rows gave unexpected results; possibly due to timing issues between Excel and VSTO.
Hope this helps other people!
I'm coming across an issue that I'm not sure there's a good answer for.
We have a bulk-insert spreadsheet template to allow people to define certain components of an online ad. They then upload the document, we process it, and set it up in the database.
Recently there was a feature request to change bulk-insert into bulk-edit; IOW, people will download an excel sheet with information about the current ad prepopulated in the fields on the sheet. They would make changes as a set, then re-upload and we'd process the changes and update the database.
The problem is, one of the pieces of information is an HTML snippet with a <script> tag, and it seems like Excel pretty much deletes that automatically, so that column is never being populated when pulling down the sheet. It makes sense, in a way; it resembles executable code and could be a serious virus threat under some conditions, but even if I specify the column as pure text (using the Style.NumberFormat = "#" in EPPlus), Excel just makes the entire piece of data go away. It also skews the columns, looks like... shifts the subsequent cells to the left by one cell.
Is there any way to (safely) make this work without requiring changes to the downloader's security settings?
I dont have time to check into this, but What if you saved the workbook as a macro workbook, to enable some of the less secure behavior within the workbook?
One other thing may be to escape the content with a single quote ' in the beginning of the cell, or wrap the entire "script" content with quotes.
What version of excel do you expect to encounter in the wild? I tested this with Excel 2013, and was able to save the following to a workbook, and parse it into a Datatable using EPP Plus 4.1.0.0:
<script type="text/javascript">$(document).ready(function() {var I =0; console.info(I+100);});</script>
'<script type="text/javascript">$(document).ready(function() {var I =0; console.info(I+100);});</script>
"<script type='text/javascript'>$(document).ready(function() {var I =0; console.info(I+100);});</script>"
Nothing fancy, just iterating each cell in the workbook, pulling in the value and converted to string:
object obj = Worksheets[WorkSheetIndex].Cells[k, l].Value;
Excel 2010.
I have a C# app that has a dataset with multiple tables. I want to export this to a workbook where each table is a separate sheet it is important to keep the order of the datasets, and the name of the data tables)
One possible solution is to loop through each table, put it on its own dataset, save this dataset as XML, then use the Application.Workbooks.OpenXML method
MSDN OpenXML Documentation
But here is the problem, if I pass the third parameter (which gives a very nice import with filters and everything), excel succeed, but it warns me that some columns were imported as text, which is ok with me (one of the columns is UPC, which should be a text, not a number).
By displaying this message it stops the process until the user clicks that this is acceptable. Then I question my self about how the mother of all excels is doing these days.
How to prevent this message from popping up?
Or another way to do this import with such nice results? (Copy and paste works but not so nicely, writing in every cell using automation is way to slow, maybe using some excel library...)
You turn
Try
var excelApplication = new Application { DisplayAlerts = false };
or
Workbook excelWorkBoook = excelApplication.Workbooks.Open(...);
excelWorkBoook.CheckCompatibility = false;
How do I import data in Excel from a CSV file using C#? Actually, what I want to achieve is similar to what we do in Excel, you go to the Data tab and then select From Text option and then use the Text to columns option and select CSV and it does the magic, and all that stuff. I want to automate it.
If you could head me in the right direction, I'll really appreciate that.
EDIT: I guess I didn't explained well. What I want to do is something like
Excel.Application excelApp;
Excel.Workbook excelWorkbook;
// open excel
excelApp = new Excel.Application();
// something like
excelWorkbook.ImportFromTextFile(); // is what I need
I want to import that data into Excel, not my own application. As far as I know, I don't think I would have to parse the CSV myself and then insert them in Excel. Excel does that for us. I simply need to know how to automate that process.
I think you're over complicating things. Excel automatically splits data into columns by comma delimiters if it's a CSV file. So all you should need to do is ensure your extension is CSV.
I just tried opening a file quick in Excel and it works fine. So what you really need is just to call Workbook.Open() with a file with a CSV extension.
You could open Excel, start recording a macro, do what you want, then see what the macro recorded. That should tell you what objects to use and how to use them.
I beleive there are two parts, one is the split operation for the csv that the other responder has already picked up on, which I don't think is essential but I'll include anyways. And the big one is the writing to the excel file, which I was able to get working, but under specific circumstances and it was a pain to accomplish.
CSV is pretty simple, you can do a string.split on a comma seperator if you want. However, this method is horribly broken, albeit I'll admit I've used it myself, mainly because I also have control over the source data, and know that no quotes or escape characters will ever appear. I've included a link to an article on proper csv parsing, however, I have never tested the source or fully audited the code myself. I have used other code by the same author with success. http://www.boyet.com/articles/csvparser.html
The second part is alot more complex, and was a huge pain for me. The approach I took was to use the jet driver to treat the excel file like a database, and then run SQL queries against it. There are a few limitations, which may cause this to not fit you're goal. I was looking to use prebuilt excel file templates to basically display data and some preset functions and graphs. To accomplish this I have several tabs of report data, and one tab which is raw_data. My program writes to the raw_data tab, and all the other tabs calculations point to cells in this table. I'll go into some of the reasoning for this behavior after the code:
First off, the imports (not all may be required, this is pulled from a larger class file and I didn't properly comment what was for what):
using System.IO;
using System.Diagnostics;
using System.Data.Common;
using System.Globalization;
Next we need to define the connection string, my class already has a FileInfo reference at this point to the file I want to use, so that's what I pass on. It's possible to search on google what all the parameters are for, but basicaly use the Jet Driver (should be available on ANY windows install) to open an excel file like you're referring to a database.
string connectString = #"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={filename};Extended Properties=""Excel 8.0;HDR=YES;IMEX=0""";
connectString = connectString.Replace("{filename}", fi.FullName);
Now let's open up the connection to the DB, and be ready to run commands on the DB:
DbProviderFactory factory = DbProviderFactories.GetFactory("System.Data.OleDb");
using (DbConnection connection = factory.CreateConnection())
{
connection.ConnectionString = connectString;
using (DbCommand command = connection.CreateCommand())
{
connection.Open();
Next we need the actual logic for DB insertion. So basically throw queries into a loop or whatever you're logic is, and insert the data row-by-row.
string query = "INSERT INTO [raw_aaa$] (correlationid, ipaddr, somenum) VALUES (\"abcdef", \"1.1.1.1", 10)";
command.CommandText = query;
command.ExecuteNonQuery();
Now here's the really annoying part, the excel driver tries to detect you're column type before insert, so even if you pass a proper integer value, if excel thinks the column type is text, it will insert all you're numbers as text, and it's very hard to get this treated like a number. As such, excel must already have the column type as the number. In order to accomplish this, for my template file I fill in the first 10 rows with dummy data, so that when you load the file in the jet driver, it can detect the proper types and use them. Then all my forumals that point at my csv table will operate properly since the values are of the right type. This may work for you if you're goals are similar to mine, and to use templates that already point to this data (just start at row 10 instead of row 2).
Because of this, my raw_aaa tab in excel might look something like this:
correlationid ipaddr somenum
abcdef 1.1.1.1 5
abcdef 1.1.1.1 5
abcdef 1.1.1.1 5
abcdef 1.1.1.1 5
abcdef 1.1.1.1 5
abcdef 1.1.1.1 5
abcdef 1.1.1.1 5
abcdef 1.1.1.1 5
Note row 1 is the column names that I referenced in my sql queries. I think you can do without this, but that will require a little more research. By already having this data in the excel file, the somenum column will be detected as a number, and any data inserted will be properly treated as such.
Antoher note that makes this annoying, the Jet Driver is 32-bit only, so in my case where I had an explicit 64-bit program, I was unable to execute this directly. So I had the nasty hack of writing to a file, then launch a program that would insert the data in the file into my excel template.
All in all, I think the solution is pretty nasty, but thus far haven't found a better way to do this unfortunatly. Good luck!
You can take a look at TakeIo.Spreadsheet .NET library. It accepts files from Excel 97-2003, Excel 2007 and newer, and CSV format (semicolon or comma separators).
Example:
var inputFile = new FileInfo("Book1.csv"); // could be .xls or .xlsx too
var sheet = Spreadsheet.Read(inputFile);
foreach (var row in sheet)
{
foreach (var cell in row)
{
// do something
}
}
You can remove beginning and trailing empty rows, and also beginning and trailing columns from the imported data using the Normalize() function:
sheet.Normalize();
Sometimes you can find that your imported data contains empty rows between data, so you can use another helper for this case:
sheet.RemoveEmptyRows();
There is a Serialize() function to convert any input to CSV too:
var outfile = new StreamWriter("AllData.csv");
sheet.Serialize(outfile);
If you like to use comma instead of the default semicolon separator in your CSV file, do:
sheet.Serialize(outfile, ',');
And yes, there is also a ToString() function too...
This package is available at NuGet too, just take a look at TakeIo.Spreadsheet.
You can use ADO.NET
http://vbadud.blogspot.com/2008/09/opening-comma-separate-file-csv-through.html
Well, importing from CSV shouldn't be a big deal. I think the most basic method would be to do it using string operations. You could build a pretty fine parser using simple Split() command, and getting the stuff in arrays.