Reading large Excel file with Interop.Excel results in System.OutOfMemoryException - c#

I followed this very promising link to make my program read Excel files, but the problem I get is System.OutOfMemoryException. As far as I can gather, it happens because of this chunk of code
object[,] valueArray = (object[,])excelRange.get_Value(
XlRangeValueDataType.xlRangeValueDefault);
which loads the whole list of data into one variable. I do not understand why the developers of the library decided to do it this way, instead of making an iterator, that would parse a sheet line by line. So, I need some working solution that would enable to read large (>700K rows) Excel files.

I am using the following function in one of my C# applications:
string[,] ReadCells(Excel._Worksheet WS,
int row1, int col1, int row2, int col2)
{
Excel.Range R = WS.get_Range(GetAddress(row1, col1),
GetAddress(row2, col2));
....
}
The reason to read a Range in one go rather than cell-by-cell is performance.
For every cell access, a lot of internal data transfer is going on. If the Range is too large to fit into memory, you can process it in smaller chunks.

Related

Populating FarPoint Spread with huge chunk of Data (64-bit Spreadsheet Issue)

In C# 64-bit, i am trying to populate a FarPoint Spreadsheet with approx 70000 rows. The entire data gets loaded on spredsheet after taking 3-4 hours of time duration, which makes the entire process to have lot of performance issues.
Currently i am populating the data to spreadsheet by individual cells.Is there anything i can do in order to increase the performance of this issue i am facing??
Below is my code template to populate the spreadsheet by individual cells.
Public void PopulateSpreadsheet()
{
FarPoint.Win.Spread.FpSpread SS;
SS.SuspendLayout();
int i = 0;
int Rows = 70000;
while( i < Rows)
{
SS.ActiveSheet.ActiveCell.Text = Data to populate;
}
SS.ResumeLayout();
}
Please guide me how to improve the performance. Any Help is appreciated!! Thank You in Advance :)
Well initially, I was storing the data directly on the spreadsheet which would take lot of time. To overcome this issue, I first stored my data to a datatable and then make this datatable as data source to the spreadsheet.
Put the data into an object array and use the .SetArray property. This is extremely fast and also allows different data types in the array.
I use this often, especially when the data come from Sql.

How to dump data in to Excel file beyond its limitation?

I have more than 2 million rows of data and I want to dump this data in Excel file but as given in this specification that Excel file can contains only 1,048,576 rows.
Consider that I have 40 million rows in the database and I want to dump this data in excel file.
I did 1 test but got the same result that is successfully got 1,048,576 rows and after that got error:
Exception from HRESULT: 0x800A03EC Error
Code:
for (int i = 1; i <= 1200000; i++)
{
oSheet.Cells[i, 1] = i;
}
I think of CSV file but I can't use it as because we cant give colors and styles to CSV file as per this Answer and my Excel file is going to contain many colors and styles.
Is there any third party tool or whatever through which I can dump more than 2 millions rows in Excel file? I am not concerned if it is paid or free.
Like you said the current excel specification Link has a maximum of 1,048,576 rows. But the amount of Sheets is only limited by the memory.
Maybe the seperation of the content on multiple sheets would be a solution for this.
or if you want to do some analysis on the data for instance you could maybe aggregat the information before loading them into the excel file.

How can I export very large amount of data to excel

I'm currently using EPPlus to export data to excel. It works admirably for small amount of data. But it consume a lots of memory for large amount of data to export.
I've briefly take a look at OOXML and/or the Microsoft Open XML SDK 2.5. I'm not sure I can use it to export data to Excel?
There is also third party provider libraries.
I wonder what solution could do the job properly of exporting very large amount of data in good performance and not taking to much spaces (ideally less than 3x the amount of data to export) ?
Update: some extra requirements...
I need to be able to export "color" information (that exclude CSV) and I would like something easy to manage like EPPlus library (exclude the XML format itself). I found another thread and they recommend Aspose or SpreadsheetGear which I'm trying. I put first answer as ok. Thanks to all.
Update 2016-02-16 Just as information... We now use SpreadSheetGear and we love it. We required support once and it was awesome.
Thanks
EPPlus to export data to excel. It works admirably for small amount of data. But it consume a lots of memory for large amount of data to export.
A few years ago, I wrote a C# library to export data to Excel using the OpenXML library, and I faced the same situation.
It worked fine until you started to have about 30k+ rows, at which point, the libraries would be trying to cache all of your data... and it'd run out of memory.
However, I fixed the problem by using the OpenXmlWriter class. This writes the data directly into the Excel file (without caching it first) and is much more memory efficient.
And, as you'll see, the library is incredibly easy to use, just call one CreateExcelDocument function, and pass it a DataSet, DataTable or List<>:
// Step 1: Create a DataSet, and put some sample data in it
DataSet ds = CreateSampleData();
// Step 2: Create the Excel .xlsx file
try
{
string excelFilename = "C:\\Sample.xlsx";
CreateExcelFile.CreateExcelDocument(ds, excelFilename);
}
catch (Exception ex)
{
MessageBox.Show("Couldn't create Excel file.\r\nException: " + ex.Message);
return;
}
You can download the full source code for C# and VB.Net from here:
Mike's Export to Excel
Good luck !
If your requirements are simple enough, you can just use CSV.
If you need more detail, look into SpreadsheetML. It's an XML schema that you can use to create a text document that Excel can open natively. It supports formulas, multiple worksheets per workbook, formatting, etc.
I second using CSV but note that Excel has limits to the number of rows and columns in a worksheet as described here:
http://office.microsoft.com/en-us/excel-help/excel-specifications-and-limits-HP010342495.aspx
specifically:
Worksheet size 1,048,576 rows by 16,384 columns
This is for Excel 2010. Keep these limits in mind when working with very large amounts of data.
As an alternative you can use my SwiftExcel library. It was design for high volume Excel output that writes data directly to the file with no memory impact.
Here is a sample of usage:
using (var ew = new ExcelWriter("C:\\temp\\test.xlsx"))
{
for (var row = 1; row <= 100; row++)
{
for (var col = 1; col <= 10; col++)
{
ew.Write($"row:{row}-col:{col}", col, row);
}
}
}

How to export Large dataset (around 40k rows and 26columns), to be written in excel sheet

We have a very large dataset (around 40k rows). I want to write to excel file.
I have already tried writing to excel file writing data cell by cell and it works.
The problem i face is , it takes lot of time to write to excel.
Is there any efficient way to do the same.
Instead of looping through every row/column, assign the full dataset to an array in memory - and then assign this array to a Range object of the same dimension. In VBA, this would look like:
Dim arr as Variant
arr = ... ' Your code to fill the array here
Set Workbook("YourWorkbook").Worksheets("Worksheetname"). _
Range(1,1).Resize(40000, 20).Value = arr
Hope you can translate this to c#...
If you write directly to excel, the idea from Albert works - but it will still be slow.
I would use a library like EPPLUS to write large amounts of data to excel - it is faster and more reliable.

IEnumerable<T> to Excel (2007) w/ Formatting

I'm looking for a good way to export an IEnumerable to Excel 2007 (.xlsb).
The T is a known type, so reflection is not completely necessary for performance reasons.
I'm using .xlsb (excel binary format) because the amount of data will be large for Excel.
The IEnumerable in question has approximately 2 million records. The IEnumerable is retrieved from an Access database (.mdb) then goes through some processing, then finally LINQ queries are wrote to generate a report structure for T. Though these records do not need to get sent to excel as one (nor could it); it will be sub-divided by a condition to which the largest record length will be roughly 1 million records.
I want to be able to convert the data to an Excel Pivot Table for easy viewing.
My initial idea was to convert the IEnumerable to a 2Darray [,] then push into an Excel range using COM interop.
public static object[,] To2DArray<T>(this IEnumerable<T> objectList)
{
Type t = typeof(T);
PropertyInfo[] fields = t.GetProperties();
object[,] my2DObject = new object[objectList.Count(), fields.Count()];
int row = 0;
foreach (var o in objectList)
{
int col = 0;
foreach (var f in fields)
{
my2DObject[row, col] = f.GetValue(o, null) ?? string.Empty;
col++;
}
row++;
}
return my2DObject;
}
I then took that object[,] and did a "transaction split" as I called it which just split up the object[,] into smaller chunks such as I'd create a List then go through each one and send into Excel range using something similar to:
Excel.Range range = worksheet.get_Range(cell,cell);
range.Value2 = List<object[,]>[0]
I'd obviously loop the above but just for simplicity it would look like the above.
This will work though, it takes an enormous amount of time to process, over 30minutes.
I've dabbled in outputting the IEnumerable to CSV though, it is not very efficient either; since it first requires the .csv file to be created, then open the .csv file using COM interop to do the excel pivot table formatting.
My question: Is there a better (preferred) way to do this?
Should I force execution (toList()) before iteration?
Should I use a different mechanism to output/display the data?
I'm open to any options to get a disconnected IEnumerable out to file in an efficient manner.
-I wouldn't be opposed to using something like SQL Express.
The main question will be where the bottleneck is. I'd have a look at the code in a profiler to see what part of the execution is taking a long time. It can also be worthwhile looking at your resource usage by running the process and seeing whether there is a shortage of CPU or Memory, or whether it's disk-locked.
If you're getting sensible performance doing 2000 records at a time, then I suspect memory resources may be an issue - with the code you posted you're converting an IEnumerable (which can avoid loading a complete dataset into memory) into an entirely in-memory structure with potentially a million records - depending on the size and number of fields involved, this could easily become an issue.
If the problem looks like the time to create the Excel file itself (which it doesn't immediately sound like it is in this case), then COM interop calls can add up, and some of the 3rd party Excel libraries aim to be much faster at writing Excel files, particularly with large numbers of records, so rather than necessarily use Excel Binary format and COM, I'd suggest looking at an Open Source library like EPPlus (http://epplus.codeplex.com/) and seeing what the performance difference is like.

Categories

Resources