Opening a huge .csv file with Excel Interop - c#

I have an application that write huge .csv files about the size ranging from 1 GB to 2 GB.
I need to color code the file and save it as .xlsx.
So I have tried using Excel Interop and it works great for small files, but when I try to open a 1.3 GB .csv file with Excel, I get an Hresult error.
Any ideas as to how I could accomplish this task either with using Excel, or if there is any other way of doing it.

Are you exceeding 1M rows ?
Maybe thats the reason for the HRESULT error.
64K rows max before Excel 2007. 1M rows for 2007

There are ways to write and read excel files without using the excel interop. I'm pretty sure I remember microsoft themself have open specifications on the excel fileformat.

Thanks for the responses guys, after thinking about it, I have decided to simply use the .csv file.

Related

excel data filtration using oledb from huge excel files size approx (200mb) using c# (ADO)

my ado application requires to read data from one xlsx file(Approx 10-20MB min) and then process the data row by row and compare with another xlsx file( approx size 250 MB min) containing over 1000000 rows with 63 columns (work like database). when i try to read the database file (250 MB) and run the oledb data queries on it, it's working so strange, its only give the first matched data from the database file. but if i opened that xlsx database file in office excel and then run my application its return all matched data from the database file without changing any code.
i already checked my dataquery its working fine in server Explorer and returns the complete result.
this process also very time consuming , i already tried openxml sax mathod to resolve the performance issue as well, but its not worked too. even it takes more time compare to oledb to read the excel.
i also have two another issues in my application,
sometime oledb returns an exception 'System resource exceeded' and
some time return 'internal ole automation error' .
i also tried google to resolve these issue but i didn't find any solution for my problem.
is there any solution to resolve these issues. please help me. any suggestion appreciated but please remember one thing i can't change any xlsx table format because i don't have rights, these xlsx files automatically generated by another tools by our sourcing partners.
thanks.

Creating excel - improve performance

Good afternoon,
we have a small problem with performance of generating excel.
First, we was creating excel cell by cell - it is ... let's say unacceptable.
Second, we started insert into excel with one command - range creating and it is much faster, but still not perfect so we are searching next solutions.
Because we can load XML file from database, we tried used XSLT and from these two files create xls file. It is nice, but after open this file there is error message shown (it is because of problem or bug in registry). User has to accept this message and after excel is opened. We want to eliminate this error message. However we don't know how.
We was thinking about convert this xls file into xlsx but we are unable to do it becouse we can't install office on server (we cannot use Interop) and OpenXML libraries don't know work with normal xls file. So my question is:
Is possible to generate from XML file with using of some XLST (or something) the xlsx file?
Eventually can what files do we need to create and zip together if we want create xlsx file?
Thank you for information
You mention not being able to use the OpenXML libraries because they don't work with .xls files, but you also say "creating cell by cell", which implies that you are generating the file from scratch. Where is the xls file coming from? You mention excel opening, but then say you can't install it on the server. So, it appears to me that a user is uploading an xls file to your server, and then you are doing something with it and giving it back to them? If that is the case and you must be able to read/write an xls file without installing office, then I would suggest using ExcelLibrary, as mentioned in this post
Indeed, creating an xlsx file is much magnitudes faster with the open xml sdk.

Huge memory Allocation when using EPPlus Excel Library

Context
I have been using EPPLUS as my tool to automate excel report generation, using C# as the client language of the library.
Problem:
After trying to write a really big report (response of a SQL Query), with pivot tables, charts and so forth, i end up having a Out of Memory Exception.
TroubleShooting
In order to troubleshoot, i decided to open an existing report that has 138MB, and use the GC object to try to take a peek on what's happening with my memory, and here are the results.
ExcelPackage pkg = new ExcelPackage (new FileInfo (#"PATH TO THE REPORT.xlsx"));
ExcelWorkbook wb = pkg.Workbook;
Garbage Collection Results, before the second line of code, and after.
So, i have no idea what to do from now on. All i am doing is opening the report, which is consuming roughtly 10 (9.98 actually) times the report size itself, on memory.
The ~138MB of the excel file, takes up 1.370.817.264 bytes of RAM.
Update One:
There's a fairly recent beta version of EPPlus that's out that has on it's changelog:
New Cell store
* Less memory consumtion
* Insert columns (not on the range level)
* Faster row inserts
After updating the Nuget, i still have the same exception, that is thrown after the first line, instead of being raised on the second line.
Modern Excel files, ie, Xlsx files are zip-compressed, and often achieve compression down to 10%. I just uncompressed a 1.6MB file I generated using a similar tool and found it extracted to 18.8 MB of data.
You've got a 0.138 GB file that is using 1.370 GB of memory, which is almost exactly 10%. The uncompressed representation in memory is what is eating your memory.
If you're curious, you can use a tool like 7-Zip to extract the Xlsx files, or you can rename the file to end in .zip and browse it in Windows.
As I've encountered this too, and found no real solution, I've had to come up with the solution by myself.
It comes as a new library: https://github.com/danielgindi/SpreadsheetStreams.net
This is based on taking a very old piece of code of mine, that supported csv and xml, refactor the interface, add xlsx support, and publish as a standalone library.
This is not a replacement for EPPlus or other spreadsheet manipulation libraries, this one is just about streaming generation of reports. Not all excel features are there also.

How do you read the binary data of an Excel file (.xls) using .NET?

No, ADO.NET will not solve my problem because the excel files I'm working with do not contain information in tabular form. In other words, there is nothing to query, and the name of the sheets and number of sheets will vary.
essentially my job is to search every single cell in an excel document and validate it against some other data.
Right now all I have is a byte[] array that represents the contents of an .xls file. Converting to a string is meaningless since it's just binary data.
If I use COM interop and run Excel in the background, is it possible to inject it with binary data in byte[] array form or do I have to save the file to disk and then automate the process of opening it and scanning each row?
Isn't there an easier way to do it?
How do you read the binary data of an excel file (.xls) using .NET
There are a number of ways, the excel file format has changed a few times so reading the files natively is hard work and version dependent, it's usually not recommended. For reading tabular data most people choose ADO.NET, but as you allude, if you need any formatting or discovery then MS would recommend COM Interop.
If I use COM interop and run Excel in the background, is it possible to inject it with binary data in byte[] array form
The excel COM object model does allow you to bulk set data to a Range object you set it with a 2 dimensional object array (object[,])
or do I have to save the file to disk and then automate the process of opening it and scanning each row?
No, you can interact with the "out of process" COM server (Excel) without having to save first, you can set your data, format it etc in memory.
Isn't there an easier way to do it?
Yes there is, checkout Spreadsheet Gear their object model is nearly identical to the com model, however you do not need Excel involved at all, it is also an order of magnitude faster working with large data. Its not cheap ($1000 bucks last time I checked) but will save you way more than that in coding effort. (I am not affiliated with Spreadsheet gear in any way)
You could use NPOI to open & read your XLS files, you'll basically want to loop through your Sheets / Rows / Columns looking for data. I commonly use NPOI to read & write XLS forms that contain data in random cells throughout a worksheet.

Open a single worksheet (single tab) from a huge excel file on a web browser using c# asp.net / MVC

I have huge excel files that I have to open from web browser. It takes several minutes to load huge file. Is it possible to open a single worksheet (single tab) at a time from excel file that contains many worksheets? I have to do this using C# / asp.net MVC
I'm assuming you have the excel workbook on the server and just want to send a single worksheet to the client. Does the user then edit the worksheet? Will they be uploading it back?
Assuming this is just a report then why not use the OpenXML sdk to read the workbook, extrac the sheet in question and send it back to the client? This is what #Jim in the comments was suggesting. You can get the SDK here: Open XML SDK 2.0 for Microsoft Office . However, I'm not sure if it will work with the 'old' excel format. I assume you'll need to save the template workbook in the new Office formats (xslx).
Your question is slightly unclear as to where the spreadsheet is stored.
If it's on a server you control, process it, extracting sheets you need, and create other sheets which are smaller in size. (Or possibly save them in a different format.).
If they're not on a server you control, download the file using C#, then go through a similiar process of extracting the sheet before opening it.
Having said that, I've dealt with some largish spreadsheets (20MB or so), and haven't really had a problem processing the entire spreadsheet as a whole.
So where is the bottleneck? Your network or possibly the machine you're running?
Use third party components.
We are fighting with server side Excel generation for years and has been defeated.
We bought third party components and all problems gone.
From your question, it seems you want to improve load time by using (opening) the data from one worksheet instead of the whole workbook. If this is the case and you only want the data, then access the workbook using ADO.NET with OLEDB provider. (You can use threading to load each worksheet to improve load performance. For instance, loading three large data sets in three worksheets took 17 seconds. Loading each worksheet on a separate thread, loaded same data sets in 5 seconds.)
From experience, performance starts to really suffer with workbooks of 40MB or more. Especially, if workbooks contain many formulas. My largest workbook of 120MB takes several minutes to load. Using OLEDB access, I can load, access, and process the same data in a few seconds.
If you want the client to open data in Excel, gather data via ADO.NET/OLEDB, get XML and transform into XMLSS using Xslt. Which is easy and there is much documentation and samples.
If you just want to present the data, gather data via ADO.NET/OLEDB, get XML and transform into HTML using Xslt. Which is easy and there is much documentation and samples.
Be aware that the browser and computer become non-responsive with large data sets. I had to set limit upper limit. If limit was reaced, I notified user of truncated results, otherwise, user thought computer was "locked".
Take a look at this question in StackOverflow:
Create Excel (.XLS and .XLSX) file from C#
I think you can open your workbook on the server (inside your ASP.NET MVC application) and process only the specific worksheet you want. You can then send such worksheet to the user using NPOI.
The following post shows you how to do that using an ASP.NET MVC application:
Creating Excel spreadsheets .XLS and .XLSX in C#
You can't "say" to Excel, even via Interop that you only want a single worksheet. There are a lot of explanations, like formulas, references and links between them, which makes the task impossible.
If you only want to read the data from the worksheet, maybe OLEDB Data Provider is the best option for you. Here is a full example: Reading excel file using OLEDB Data Provider
Otherwise, you will need to load the entire workbook in memory before do anything with it.

Categories

Resources