In C# 64-bit, i am trying to populate a FarPoint Spreadsheet with approx 70000 rows. The entire data gets loaded on spredsheet after taking 3-4 hours of time duration, which makes the entire process to have lot of performance issues.
Currently i am populating the data to spreadsheet by individual cells.Is there anything i can do in order to increase the performance of this issue i am facing??
Below is my code template to populate the spreadsheet by individual cells.
Public void PopulateSpreadsheet()
{
FarPoint.Win.Spread.FpSpread SS;
SS.SuspendLayout();
int i = 0;
int Rows = 70000;
while( i < Rows)
{
SS.ActiveSheet.ActiveCell.Text = Data to populate;
}
SS.ResumeLayout();
}
Please guide me how to improve the performance. Any Help is appreciated!! Thank You in Advance :)
Well initially, I was storing the data directly on the spreadsheet which would take lot of time. To overcome this issue, I first stored my data to a datatable and then make this datatable as data source to the spreadsheet.
Put the data into an object array and use the .SetArray property. This is extremely fast and also allows different data types in the array.
I use this often, especially when the data come from Sql.
Related
I have a giant (100Gb) csv file with several columns and a smaller (4Gb) csv also with several columns. The first column in both datasets have the same category. I want to create a third csv with the records of the big file which happen to have a matching first column in the small csv. In database terms it would be a simple join on the first column.
I am trying to find the best approach to go about this in terms of efficiency. As the smaller dataset fits in memory, I was thinking of loading it in a sort of set structure and then read the big file line to line and querying the in memory set, and write to file on positive.
Just to frame the question in SO terms, is there an optimal way to achieve this?
EDIT: This is a one time operation.
Note: the language is not relevant, open to suggestions on column, row oriented databases, python, etc...
Something like
import csv
def main():
with open('smallfile.csv', 'rb') as inf:
in_csv = csv.reader(inf)
categories = set(row[0] for row in in_csv)
with open('bigfile.csv', 'rb') as inf, open('newfile.csv', 'wb') as outf:
in_csv = csv.reader(inf)
out_csv = csv.writer(outf)
out_csv.writerows(row for row in in_csv if row[0] in categories)
if __name__=="__main__":
main()
I presume you meant 100 gigabytes, not 100 gigabits; most modern hard drives top out around 100 MB/s, so expect it to take around 16 minutes just to read the data off the disk.
If you are only doing this once, your approach should be sufficient. The only improvement I would make is to read the big file in chunks instead of line by line. That way you don't have to hit the file system as much. You'd want to make the chunks as big as possible while still fitting in memory.
If you will need to do this more than once, consider pushing the data into some database. You could insert all the data from the big file and then "update" that data using the second, smaller file to get a complete database with one large table with all the data. If you use a NoSQL database like Cassandra this should be fairly efficient since Cassandra is pretty good and handling writes efficiently.
I'm currently using EPPlus to export data to excel. It works admirably for small amount of data. But it consume a lots of memory for large amount of data to export.
I've briefly take a look at OOXML and/or the Microsoft Open XML SDK 2.5. I'm not sure I can use it to export data to Excel?
There is also third party provider libraries.
I wonder what solution could do the job properly of exporting very large amount of data in good performance and not taking to much spaces (ideally less than 3x the amount of data to export) ?
Update: some extra requirements...
I need to be able to export "color" information (that exclude CSV) and I would like something easy to manage like EPPlus library (exclude the XML format itself). I found another thread and they recommend Aspose or SpreadsheetGear which I'm trying. I put first answer as ok. Thanks to all.
Update 2016-02-16 Just as information... We now use SpreadSheetGear and we love it. We required support once and it was awesome.
Thanks
EPPlus to export data to excel. It works admirably for small amount of data. But it consume a lots of memory for large amount of data to export.
A few years ago, I wrote a C# library to export data to Excel using the OpenXML library, and I faced the same situation.
It worked fine until you started to have about 30k+ rows, at which point, the libraries would be trying to cache all of your data... and it'd run out of memory.
However, I fixed the problem by using the OpenXmlWriter class. This writes the data directly into the Excel file (without caching it first) and is much more memory efficient.
And, as you'll see, the library is incredibly easy to use, just call one CreateExcelDocument function, and pass it a DataSet, DataTable or List<>:
// Step 1: Create a DataSet, and put some sample data in it
DataSet ds = CreateSampleData();
// Step 2: Create the Excel .xlsx file
try
{
string excelFilename = "C:\\Sample.xlsx";
CreateExcelFile.CreateExcelDocument(ds, excelFilename);
}
catch (Exception ex)
{
MessageBox.Show("Couldn't create Excel file.\r\nException: " + ex.Message);
return;
}
You can download the full source code for C# and VB.Net from here:
Mike's Export to Excel
Good luck !
If your requirements are simple enough, you can just use CSV.
If you need more detail, look into SpreadsheetML. It's an XML schema that you can use to create a text document that Excel can open natively. It supports formulas, multiple worksheets per workbook, formatting, etc.
I second using CSV but note that Excel has limits to the number of rows and columns in a worksheet as described here:
http://office.microsoft.com/en-us/excel-help/excel-specifications-and-limits-HP010342495.aspx
specifically:
Worksheet size 1,048,576 rows by 16,384 columns
This is for Excel 2010. Keep these limits in mind when working with very large amounts of data.
As an alternative you can use my SwiftExcel library. It was design for high volume Excel output that writes data directly to the file with no memory impact.
Here is a sample of usage:
using (var ew = new ExcelWriter("C:\\temp\\test.xlsx"))
{
for (var row = 1; row <= 100; row++)
{
for (var col = 1; col <= 10; col++)
{
ew.Write($"row:{row}-col:{col}", col, row);
}
}
}
Good day.
I am asking for a bit of advise on what other experience has been and pitfalls ect. I am a SQL developer but needing to write a front end using c#.
I am returning a query from a MSSQL database via a stored procedure and putting it into a Datatable. There are about 140k rows in the result set. I am using standard calls with a datareader to return the resultset. No binding.
What I would like to do is return parts of the datatable to datagrid on a form and allow a user to manipulate the data in the grid and save back to the datatable then collect the next part of the datatable and manipulate that. I don't want to to pull things into a datatable in segments as I need to update calculations on the entire datatable when a change is made.
And then finally save the changes back to the database, when done.
If anyone can point me to a the best and most efficient way it would be greatly appreciated.
Thank you in advance
Scott
I'm looking for a good way to export an IEnumerable to Excel 2007 (.xlsb).
The T is a known type, so reflection is not completely necessary for performance reasons.
I'm using .xlsb (excel binary format) because the amount of data will be large for Excel.
The IEnumerable in question has approximately 2 million records. The IEnumerable is retrieved from an Access database (.mdb) then goes through some processing, then finally LINQ queries are wrote to generate a report structure for T. Though these records do not need to get sent to excel as one (nor could it); it will be sub-divided by a condition to which the largest record length will be roughly 1 million records.
I want to be able to convert the data to an Excel Pivot Table for easy viewing.
My initial idea was to convert the IEnumerable to a 2Darray [,] then push into an Excel range using COM interop.
public static object[,] To2DArray<T>(this IEnumerable<T> objectList)
{
Type t = typeof(T);
PropertyInfo[] fields = t.GetProperties();
object[,] my2DObject = new object[objectList.Count(), fields.Count()];
int row = 0;
foreach (var o in objectList)
{
int col = 0;
foreach (var f in fields)
{
my2DObject[row, col] = f.GetValue(o, null) ?? string.Empty;
col++;
}
row++;
}
return my2DObject;
}
I then took that object[,] and did a "transaction split" as I called it which just split up the object[,] into smaller chunks such as I'd create a List then go through each one and send into Excel range using something similar to:
Excel.Range range = worksheet.get_Range(cell,cell);
range.Value2 = List<object[,]>[0]
I'd obviously loop the above but just for simplicity it would look like the above.
This will work though, it takes an enormous amount of time to process, over 30minutes.
I've dabbled in outputting the IEnumerable to CSV though, it is not very efficient either; since it first requires the .csv file to be created, then open the .csv file using COM interop to do the excel pivot table formatting.
My question: Is there a better (preferred) way to do this?
Should I force execution (toList()) before iteration?
Should I use a different mechanism to output/display the data?
I'm open to any options to get a disconnected IEnumerable out to file in an efficient manner.
-I wouldn't be opposed to using something like SQL Express.
The main question will be where the bottleneck is. I'd have a look at the code in a profiler to see what part of the execution is taking a long time. It can also be worthwhile looking at your resource usage by running the process and seeing whether there is a shortage of CPU or Memory, or whether it's disk-locked.
If you're getting sensible performance doing 2000 records at a time, then I suspect memory resources may be an issue - with the code you posted you're converting an IEnumerable (which can avoid loading a complete dataset into memory) into an entirely in-memory structure with potentially a million records - depending on the size and number of fields involved, this could easily become an issue.
If the problem looks like the time to create the Excel file itself (which it doesn't immediately sound like it is in this case), then COM interop calls can add up, and some of the 3rd party Excel libraries aim to be much faster at writing Excel files, particularly with large numbers of records, so rather than necessarily use Excel Binary format and COM, I'd suggest looking at an Open Source library like EPPlus (http://epplus.codeplex.com/) and seeing what the performance difference is like.
I need to handle very large datatables (2 million rows+) that comes from databases (SQL, Oracle, Access, MySQL, Sharepoint etc) outside of my control: Currently I loop through every row and column building a string object, but I run out of memory at about 100k rows.
The only solution I may take is to break the datatable into smaller pieces and persisting each block before starting on the next block of rows.
Since I cannot add ROW_NUMBER() or anything similar, I have to handle the populated datatable.
How can I easily (keep performance in mind) break the populated datatable into smaller datatables like paging?
PS there is no visual component to this functionality.
Are you using string concatenation? like this string += string.
Change that to StringBuilder and you should not have problems, at least not for 20k rows.
If you are talking about filling a DataTable object (which loads the results of your calls into memory before processing), you will likely be better off using a datareader for each of the mentioned providers so then you can process each row as it is read from the database instead of storing the DataTable in memory...
A great answer to another question lists the pro/cons of datareaders/datatables
If you're already using datareaders- ignore this. But your memory problem might be from also storing the retrieved results...