Optimised way for byte array to memory stream - c#

I use the following lines of code
byte[] byteInfo = workbook.SaveToMemory(FileFormat.OpenXMLWorkbookMacroEnabled);
workStream.Write(byteInfo, 0, byteInfo.Length);
workStream.Position = 0;
return workStream;
to download an excel file from the browser with the help of C# and Spreadsheetagear.
It works fine for less records but when I try to download a workbook with huge data (an excel with 50k rows 1k columns and macro enabled) this line
byte[] byteInfo = workbook.SaveToMemory(FileFormat.OpenXMLWorkbookMacroEnabled);
alone takes nearly 4 - 5 min. Is there any optimised way of doing it such that it takes only 1 or 2 min to download huge file.

try
workbook.SaveToStream(outputstream, SpreadsheetGear.FileFormat.OpenXMLWorkbook);
Streams typically can be faster as they normally save as fast as the ram will take. instead of filling up a page file.

For large workbooks the time taken for SpreadsheetGear processes to run is related to the time taken to open the workbook into memory - rather than the time take to compile the information and download it. You can test this by making a request to open the workbook and send a simple success response without the downloaded byte array. Spreadsheet files with a size greater than 1-2MB start to slow things down and the process can get very slow beyond 5MB.
Options are:
Create an empty spreadsheet and build the content by extracting it
from a database and inserting into the spreadsheet for download.
This is faster than opening a large spreadsheet file.
Get rid of non-essential content that takes up space and increases the size of
the spreadsheet
Break up the spreadsheet into smaller individual files. You can test this comparing the speed to process 10 x 1MB files vs 2 x 5MB files vs 1 x 10MB file
Work with CSV import / export file processes.
They are faster, but obviously not as smart as the SpreadsheetGear
functionality
Set up a direct download of a stored spreadsheet file, not via SpreadsheetGear. This is faster but obviously requires a static file in store or one that has been created and stored by another process

Related

Extracting 7z file with millions of 200 bit file size inside take hours to finish. How to speed up?

Good day, I've created my own custom Wizard Installer for my website project. My goal is to minimize the work during the installation of our client.
I'm trying to extract a 7z file that has millions of tiny files (200-bit size of each file) inside. I'm using sharpcompress to achieve this extracting process but it seems that it will take hours to finish the task which is very bad for the user.
I don't care about compression. What I need is to reduce the time of the extracting process of these millions of tiny files or if possible, to speed up the extraction.
My question is. What is the fastest way to extract millions of tiny files? or any method to pack and unpack the files with the highest speed of unpacking.
I'm trying to extract the 7z file by this code:
using (SevenZipArchive zipArchive = SevenZipArchive.Open(source7z))
{
zipArchive.WriteToDirectory(destination7z,
new ExtractionOptions { Overwrite = true, ExtractFullPath = true });
}
But seems the extracting time is very slow for tiny files.

Huge memory Allocation when using EPPlus Excel Library

Context
I have been using EPPLUS as my tool to automate excel report generation, using C# as the client language of the library.
Problem:
After trying to write a really big report (response of a SQL Query), with pivot tables, charts and so forth, i end up having a Out of Memory Exception.
TroubleShooting
In order to troubleshoot, i decided to open an existing report that has 138MB, and use the GC object to try to take a peek on what's happening with my memory, and here are the results.
ExcelPackage pkg = new ExcelPackage (new FileInfo (#"PATH TO THE REPORT.xlsx"));
ExcelWorkbook wb = pkg.Workbook;
Garbage Collection Results, before the second line of code, and after.
So, i have no idea what to do from now on. All i am doing is opening the report, which is consuming roughtly 10 (9.98 actually) times the report size itself, on memory.
The ~138MB of the excel file, takes up 1.370.817.264 bytes of RAM.
Update One:
There's a fairly recent beta version of EPPlus that's out that has on it's changelog:
New Cell store
* Less memory consumtion
* Insert columns (not on the range level)
* Faster row inserts
After updating the Nuget, i still have the same exception, that is thrown after the first line, instead of being raised on the second line.
Modern Excel files, ie, Xlsx files are zip-compressed, and often achieve compression down to 10%. I just uncompressed a 1.6MB file I generated using a similar tool and found it extracted to 18.8 MB of data.
You've got a 0.138 GB file that is using 1.370 GB of memory, which is almost exactly 10%. The uncompressed representation in memory is what is eating your memory.
If you're curious, you can use a tool like 7-Zip to extract the Xlsx files, or you can rename the file to end in .zip and browse it in Windows.
As I've encountered this too, and found no real solution, I've had to come up with the solution by myself.
It comes as a new library: https://github.com/danielgindi/SpreadsheetStreams.net
This is based on taking a very old piece of code of mine, that supported csv and xml, refactor the interface, add xlsx support, and publish as a standalone library.
This is not a replacement for EPPlus or other spreadsheet manipulation libraries, this one is just about streaming generation of reports. Not all excel features are there also.

Updating large binary files

Thera a number of large binary files (>1Gb each) that need to be partially updated with no length changes. How do I update it as fast as possible? Looks like they're being buffered prior saving to disk when I use standard FileStream call. Would like to set certain bytes directly on file system.
Thanks.

What is the best / fastest way to export large set of data from C# to excel

I have code that uses the OpenXML library to export data.
I have 20,000 rows and 22 columns and it takes ages (about 10 minutes).
is there any solution that would export data from C# to excel that would be faster as i am doing this from an asp.net mvc app and many people browsers are timing out.
Assuming 20'000 rows and 22 columns with about 100 bytes each, makes 41 megabytes data alone. plus xml tags, plus formatting, I'd say you end up zipping (.xlsx is nothing but several zipped xml files) 100 mb of data.
Of course this takes a while, and so does fetching the data.
I recommend you use excel package plus instead of the Office OpenXML development kit.
http://epplus.codeplex.com/
There's probably a bug/performance-issue in the write-in-a-hurry-and-hope-that-it-doesnt-blow-up-too-soon Microsoft code.
CSV. It is a plain text file, but can be opened by any version of Excel.
No doubt it is a easier way to export data to excel. A lot of website provide data export as CSV.
What you need to do is just add a comma (,) to separate the values and a line break to separate the records. It won't take extra resource to build the csv file, so it is quite fast.
I wound up using an open source solution called ClosedXML that worked great
Depending on what version of Excel you are targetting, you could expose the data as an OData service which Excel 2010 can naturally consume and will handle the downloading and formattting for you.
I am assuming that this data is something that needs to be completely sent to the client and has already been pre-filtered in some fashion, but still needs to be sent back to the person who made the request.
In this case, you want to perform this particular operation 'asynchronously'. I'm not sure if this would fit your workflow, but say that a person requests this large XML formatted document, I would: a) queue another worker thread to kick off the generation of the document while returning a 'token' (perhaps a GUID to the requester); b) return a link to a page where the requestor can click on the link (passing the token) allowing the page to look up results.
If the thread has completed processing the document, it places it into a special folder with a unique name and adds the token to a database table with its document location. If the person requests that page, the token exists in the database and the document exists on the file system, they are allowed to click and download it through HTTP. If it does not exist, they are either told it does not exist or to wait for the results. (This message can be based on the time the request was received.)
If the person downloads the document successfully (and you can do this through script), you can remove the entry for the database for the document with that token and delete the file from the file system.
I hope I read this question correctly.
I have found that I can speed up exporting data from a database into an Excel spreadsheet by limiting the number of export operations. I found that by accumulating 100 lines of data before writing, the creation speed increased by a factor of at least 5-10x.
The mistake when exporting data that is most often done when exporting data is in the workflow
Build Model
Build XML DOM
Save XML DOM to file
This workflow leads to an overhead because building up the XML DOM needs it's time, the XML DOM is kept in memory together with the Model and then the whole bunch of data is written to a file.
A better way to handle this is to convert your model entry by entry directly to the target format and write it directly to a (buffered) file.
A format with low overhead that's fast to write and is readable by Excel is CSV (ok, it's legacy, it's awkward...).

DotNetZip - Calculate final zip size before calling Save(stream)

When using DotNetZip, is it possible to get what the final zip file size will be before calling Save(stream)? I have a website where users will be downloading fairly large zip files (over 2 gigs), and I would like to be able to stream the file to the user rather then buffering the entire file into memory. Some thing like this...
response.BufferOutput = false;
response.AddHeader("Content-Length", ????);
Is this possible?
If the stream is homogenous, you could waste some time by compressing a 'small' portion ahead, calculating the compression ratio and extrapolating from that.
If you are meaning to set a content-length header or something like that, it can only be done when you (1) write a temporary file (advisable if there is any risk of connection trouble and clients requesting specific chunks anyway) (2) can keep the entire file in memory (presumably on ly on 64bit system with copious memory)
Of course, you could waste enormous resources and just compress the stream twice, but I hope you agree that would be silly.
The way to do what you want is to save the file to a temporary filesystem file, then stream the result to the user. This lets you compute the size then transmit the file.
In this case dotnetzip will not save the file into memory.

Categories

Resources