I am using EPPlus open source tool for creating / editing excel files on server.
When I insert data to excel created by EPPlus by opening it using C# .net code some time size of files gets increase upto 8-10Mb.
If I copy all content in MS Excel from that file and save it then it is of only 20 kb.
What should be the actual cause behind increasing excel file size created by EPPlus
It was due to unreadable content generated in excel it self. When I used new release of EPPlus (4.1.0) I have not get any problem like this.
Related
I am using c# and Interop library and winform. I have a problem with the pdf exported using this code to create pdf, it is working:
Workbook wkb = GetApp().Workbooks.Open(excelLocation);
wkb.ExportAsFixedFormat(XlFixedFormatType.xlTypePDF, outputLocation);
wkb.Close();
GetApp().Quit();
It is working but I have problems in the design. The page in pdf at 100% of size (normnal size) is:
Too small, I would like to see the table biggest.
when you open in excel size isnormal. why not in PDF exported file? Do I need extra configuration in ExportAsFixedFormat?
How can I do it?
I have created one application using openXML features to export excel file. Initially I tried to fill excel cell by cell but, I faced performance issue because of bulk data. Then I copied datatable content into Excel table.
Now I am able to generate bulk data.
But after generating excel file I am not able apply copy formula.
Context
I have been using EPPLUS as my tool to automate excel report generation, using C# as the client language of the library.
Problem:
After trying to write a really big report (response of a SQL Query), with pivot tables, charts and so forth, i end up having a Out of Memory Exception.
TroubleShooting
In order to troubleshoot, i decided to open an existing report that has 138MB, and use the GC object to try to take a peek on what's happening with my memory, and here are the results.
ExcelPackage pkg = new ExcelPackage (new FileInfo (#"PATH TO THE REPORT.xlsx"));
ExcelWorkbook wb = pkg.Workbook;
Garbage Collection Results, before the second line of code, and after.
So, i have no idea what to do from now on. All i am doing is opening the report, which is consuming roughtly 10 (9.98 actually) times the report size itself, on memory.
The ~138MB of the excel file, takes up 1.370.817.264 bytes of RAM.
Update One:
There's a fairly recent beta version of EPPlus that's out that has on it's changelog:
New Cell store
* Less memory consumtion
* Insert columns (not on the range level)
* Faster row inserts
After updating the Nuget, i still have the same exception, that is thrown after the first line, instead of being raised on the second line.
Modern Excel files, ie, Xlsx files are zip-compressed, and often achieve compression down to 10%. I just uncompressed a 1.6MB file I generated using a similar tool and found it extracted to 18.8 MB of data.
You've got a 0.138 GB file that is using 1.370 GB of memory, which is almost exactly 10%. The uncompressed representation in memory is what is eating your memory.
If you're curious, you can use a tool like 7-Zip to extract the Xlsx files, or you can rename the file to end in .zip and browse it in Windows.
As I've encountered this too, and found no real solution, I've had to come up with the solution by myself.
It comes as a new library: https://github.com/danielgindi/SpreadsheetStreams.net
This is based on taking a very old piece of code of mine, that supported csv and xml, refactor the interface, add xlsx support, and publish as a standalone library.
This is not a replacement for EPPlus or other spreadsheet manipulation libraries, this one is just about streaming generation of reports. Not all excel features are there also.
No, ADO.NET will not solve my problem because the excel files I'm working with do not contain information in tabular form. In other words, there is nothing to query, and the name of the sheets and number of sheets will vary.
essentially my job is to search every single cell in an excel document and validate it against some other data.
Right now all I have is a byte[] array that represents the contents of an .xls file. Converting to a string is meaningless since it's just binary data.
If I use COM interop and run Excel in the background, is it possible to inject it with binary data in byte[] array form or do I have to save the file to disk and then automate the process of opening it and scanning each row?
Isn't there an easier way to do it?
How do you read the binary data of an excel file (.xls) using .NET
There are a number of ways, the excel file format has changed a few times so reading the files natively is hard work and version dependent, it's usually not recommended. For reading tabular data most people choose ADO.NET, but as you allude, if you need any formatting or discovery then MS would recommend COM Interop.
If I use COM interop and run Excel in the background, is it possible to inject it with binary data in byte[] array form
The excel COM object model does allow you to bulk set data to a Range object you set it with a 2 dimensional object array (object[,])
or do I have to save the file to disk and then automate the process of opening it and scanning each row?
No, you can interact with the "out of process" COM server (Excel) without having to save first, you can set your data, format it etc in memory.
Isn't there an easier way to do it?
Yes there is, checkout Spreadsheet Gear their object model is nearly identical to the com model, however you do not need Excel involved at all, it is also an order of magnitude faster working with large data. Its not cheap ($1000 bucks last time I checked) but will save you way more than that in coding effort. (I am not affiliated with Spreadsheet gear in any way)
You could use NPOI to open & read your XLS files, you'll basically want to loop through your Sheets / Rows / Columns looking for data. I commonly use NPOI to read & write XLS forms that contain data in random cells throughout a worksheet.
I have an application that write huge .csv files about the size ranging from 1 GB to 2 GB.
I need to color code the file and save it as .xlsx.
So I have tried using Excel Interop and it works great for small files, but when I try to open a 1.3 GB .csv file with Excel, I get an Hresult error.
Any ideas as to how I could accomplish this task either with using Excel, or if there is any other way of doing it.
Are you exceeding 1M rows ?
Maybe thats the reason for the HRESULT error.
64K rows max before Excel 2007. 1M rows for 2007
There are ways to write and read excel files without using the excel interop. I'm pretty sure I remember microsoft themself have open specifications on the excel fileformat.
Thanks for the responses guys, after thinking about it, I have decided to simply use the .csv file.