How to dump data in to Excel file beyond its limitation? - c#

I have more than 2 million rows of data and I want to dump this data in Excel file but as given in this specification that Excel file can contains only 1,048,576 rows.
Consider that I have 40 million rows in the database and I want to dump this data in excel file.
I did 1 test but got the same result that is successfully got 1,048,576 rows and after that got error:
Exception from HRESULT: 0x800A03EC Error
Code:
for (int i = 1; i <= 1200000; i++)
{
oSheet.Cells[i, 1] = i;
}
I think of CSV file but I can't use it as because we cant give colors and styles to CSV file as per this Answer and my Excel file is going to contain many colors and styles.
Is there any third party tool or whatever through which I can dump more than 2 millions rows in Excel file? I am not concerned if it is paid or free.

Like you said the current excel specification Link has a maximum of 1,048,576 rows. But the amount of Sheets is only limited by the memory.
Maybe the seperation of the content on multiple sheets would be a solution for this.
or if you want to do some analysis on the data for instance you could maybe aggregat the information before loading them into the excel file.

Related

How to merge csv and excel?

I am trying to build simple program that does my weekly job.
Everytime I receive csv file, I maintain excel file.
My csv is like below:
key_code,eng_name,...so on
000001,some name,...so on
My excel is like below:
Some text are written on A1-G4
No column hearders written
Data is from 5th row
Each row has data from B-G(1st row B5-G5, 2nd row B6-G6)
If key_code in csv does not exist in excel, I add.
If key_code in csv does exist in excel, I update the rest columns.
If key_code in excel does not exist in csv, I delete the row.
Can anyone tell me any easy way or steps to get this done?
I am very confusing about what to use to update excel file among OleDb, Interop.Excel, epplus, spire.xls, etc.
And in which class do I have to store csv data and excel data to compare.
For reading CSV you can use ChoETL reader, this is one of the best CSV readers I have ever used.
The tricky part is to how to write Excel file and choosing the right tool, amongst the tools you have mentioned EPPlus is best because
Excel.Interop needs Excel(MS Office) to be installed on production machine which can create licencing issues
To use OleDB you need some nitty gritty to use it a better way
EPPlus provides some abstraction which makes it easy to manipulate the excel files
using (var p = new ExcelPackage())
{
//A workbook must have at least on cell, so lets add one...
var ws=p.Workbook.Worksheets.Add("MySheet");
//To set values in the spreadsheet use the Cells indexer.
ws.Cells["A1"].Value = "This is cell A1";
//Save the new workbook. We haven't specified the filename so use the Save as method.
p.SaveAs(new FileInfo(#"c:\workbooks\myworkbook.xlsx"));
}
This is very simple example given on the github page to write, please use it and post any specific issues
If key_code in csv does not exist in excel, I add.
If key_code in csv does exist in excel, I update the rest columns.
If key_code in excel does not exist in csv, I delete the row.
As my understanding of the rules above, you simply delete the old excel file and create a new file from the data in the CSV file.
You can use R to do this very easily:
#Install package 'writexl' if you didn't, by install.packages("writexl")
library(writexl)
#File excel
fn <- "file.xlsx"
#Check its existence
if (file.exists(fn))
#Delete file if it exists
file.remove(fn)
#Read the csv file to a data frame
df <- read.csv("C:/newfile.csv")
#Write the data frame to excel file. Change col_names = TRUE if you want the headers.
write_xlsx(
df,
path = "file.xlsx",
col_names = FALSE
)

How to read the header row (assuming first row is header) in excel without loading entire excel file

I am combining multiple large excel files with different columns and number of columns.
Before starting to combine, I want to collect all header rows in order to make a data table which having all columns in advance.
I know that there is a method datatable.merge in c#, which allow to add missing column while combining.
Because there are too many big excel files, and the maximum rows per sheet in excel is about 1 millions row. So when reaching limit, I must save part of combining to excel, clear the content and keep combine after that. This will lead to the result that the saving part in the early process will don't have the same schema as the final one.
This is the reason why I must collect all header in advance.
As far as I am concerned, library in c# like Epplus or ExcelDataReader load entire content of excel. This lasts very long. I don't need to load all content at once.
Somebody here know how to load excel header row only ?
Thank you so much.

I Help and Advice on using Oledb for large excel Files

So I am new to Oledb and I a have project that requires me to grab data from an excel file using a Console Application. The excel file has around 500 columns and 55 rows. How could I possibly get data from columns past 255?
In order to read columns 256 -> you just need to modify the Select statement. By default both the Microsoft.ACE.OLEDB.12.0 and the Microsoft.Jet.OLEDB.4.0 drivers will read from Column 1-255 (A->IU) but you can ask it to read the remaining columns by specifying them on the Select statement.
To read the next 255 columns and taking "Sheet1" as your sheet name you would specify...
Select * From [Sheet1$IV:SP]
This will work even if there aren't another 255 columns. It will simply return the next chunk of columns if there are 1...255 additional columns.
Incidentally, the Microsoft.ACE.OLEDB.12.0 driver will read both .xls and any variant of .xlsx, .xlsm etc without changing the extended properties from "Excel 12.0". There is no need to if...then...else the connection string depending on the file type.
The OLEDB driver is pretty good for the most part but it really does rely on well formed sheets. Mixed data types aren't handled terribly well and it does weird things if the first columns/rows are empty but aside from that it's fine. I use it a lot.

C# range.Rows.Count doesn't count right - Excel formating?

I'm using C# to get data from Excel.
For reading out the Data I use this piece of code:
for (int rCnt = 2; rCnt <= range.Rows.Count; rCnt++)
The Sheet is 80 rows long, the range.Rows.Count says it is 135 rows long.
I have this problem with 2 Excel Files.
The Excel files are generated from an Sharepoint and have filters and some other formating.
When I copy the data into an empty Excel file (with Strg + A, not manually selected) it counts the right amount of rows.
With a 3rd Excel file (from an Sharepoint, too) its no problem...
Maybe a solution is to change the excel file first, it is only needed for my programm not for anything else, so that would be ok.
Any Ideas?
Edit:
I just stopped the code and saw, that after the range of 80, all the entries in the object are "null", so there is no hidden Data or something
Edit2:
I deleted all the Data from that Sheet and now it counts 137 rows, so there has to be some formating stuff that is counted...
First of all you mention that the excel file has some filtering and formatting. Could it be that some of the formatting is applied to the first 135 rows in the file and therefore the select returns them all ?
And secondly, what do you use to read the Excel file ? Do you use OleDb?
And are the rest of the returned rows empty ?
If that is the case you can use : SELECT * FROM SHEET WHERE [Column] IS NOT NULL

How can I export very large amount of data to excel

I'm currently using EPPlus to export data to excel. It works admirably for small amount of data. But it consume a lots of memory for large amount of data to export.
I've briefly take a look at OOXML and/or the Microsoft Open XML SDK 2.5. I'm not sure I can use it to export data to Excel?
There is also third party provider libraries.
I wonder what solution could do the job properly of exporting very large amount of data in good performance and not taking to much spaces (ideally less than 3x the amount of data to export) ?
Update: some extra requirements...
I need to be able to export "color" information (that exclude CSV) and I would like something easy to manage like EPPlus library (exclude the XML format itself). I found another thread and they recommend Aspose or SpreadsheetGear which I'm trying. I put first answer as ok. Thanks to all.
Update 2016-02-16 Just as information... We now use SpreadSheetGear and we love it. We required support once and it was awesome.
Thanks
EPPlus to export data to excel. It works admirably for small amount of data. But it consume a lots of memory for large amount of data to export.
A few years ago, I wrote a C# library to export data to Excel using the OpenXML library, and I faced the same situation.
It worked fine until you started to have about 30k+ rows, at which point, the libraries would be trying to cache all of your data... and it'd run out of memory.
However, I fixed the problem by using the OpenXmlWriter class. This writes the data directly into the Excel file (without caching it first) and is much more memory efficient.
And, as you'll see, the library is incredibly easy to use, just call one CreateExcelDocument function, and pass it a DataSet, DataTable or List<>:
// Step 1: Create a DataSet, and put some sample data in it
DataSet ds = CreateSampleData();
// Step 2: Create the Excel .xlsx file
try
{
string excelFilename = "C:\\Sample.xlsx";
CreateExcelFile.CreateExcelDocument(ds, excelFilename);
}
catch (Exception ex)
{
MessageBox.Show("Couldn't create Excel file.\r\nException: " + ex.Message);
return;
}
You can download the full source code for C# and VB.Net from here:
Mike's Export to Excel
Good luck !
If your requirements are simple enough, you can just use CSV.
If you need more detail, look into SpreadsheetML. It's an XML schema that you can use to create a text document that Excel can open natively. It supports formulas, multiple worksheets per workbook, formatting, etc.
I second using CSV but note that Excel has limits to the number of rows and columns in a worksheet as described here:
http://office.microsoft.com/en-us/excel-help/excel-specifications-and-limits-HP010342495.aspx
specifically:
Worksheet size 1,048,576 rows by 16,384 columns
This is for Excel 2010. Keep these limits in mind when working with very large amounts of data.
As an alternative you can use my SwiftExcel library. It was design for high volume Excel output that writes data directly to the file with no memory impact.
Here is a sample of usage:
using (var ew = new ExcelWriter("C:\\temp\\test.xlsx"))
{
for (var row = 1; row <= 100; row++)
{
for (var col = 1; col <= 10; col++)
{
ew.Write($"row:{row}-col:{col}", col, row);
}
}
}

Categories

Resources