I am using the EPPlus library to create an Excel workbook with many worksheets. I was wondering if it is safe to build the worksheets in parallel. I could not find a mention in the (limited) documentation if the library supports this kind of behavior:
package = new ExcelPackage();
int start = 1;
int end = 100;
Parallel.For(start, end; s =>
{
var worksheet = package.Workbook.Worksheets.Add("Worksheet" + s.ToString());
//routine to populate data here
});
Take a short look for the source code: https://github.com/JanKallman/EPPlus/blob/master/EPPlus/ExcelWorksheets.cs
As you can see, the Add method calls the AddSheet method that uses lock(...) to make the operation thread-safe, so yes, you can use the Parallel.For.
Related
I use Spreadsheetgear to export the results of custom SQL queries as excel files.
Now I want to improve this system: The user will be able to upload an excel template file into the database (currently as varbinary). For example, it could have one worksheet with calculations, then when exporting data into that template it'll fill a different worksheet with the datatable from the query.
Can spreadsheetgear do this? If so, how does it work - mainly how can I load an existing excel file as a Spreadsheetgear workbook/workbookset? I could not find anything in their documentation (though I am still looking).
Edit: Solved.
I create the workbook manually, load the template from the database as a byte[], then open said template with the OpenFromMemory function:
// Create workbookSet
SpreadsheetGear.IWorkbookSet workbookSet = SpreadsheetGear.Factory.GetWorkbookSet();
// Create a new empty workbook in the workbookSet.
SpreadsheetGear.IWorkbook workbook = workbookSet.Workbooks.Add();
if(TemplateID != -1) // If this case requires a template
{
// Get template from SQL database (.xlsx stored as varbinary(max))
byte[] template = GetTemplateByID(VorlagenID);
workbook = workbookSet.Workbooks.OpenFromMemory(template);
}
// Create export worksheet
SpreadsheetGear.IWorksheet worksheet = workbook.Worksheets[0];
worksheet.Name = "Export";
[...]
Templates always use the Worksheet[1] in my case, but it should be easy to create a Worksheet[1] for the export.
Yes it is possible to set up pre-constructed template files using SpreadsheetGear. We use this extensively using .NET / C# / MSSSQL. The method allows you to create quite sophisticated templates and then simply add the required data. This of course includes any calculations you build into the template.
Method 1 - Store the template on a webserver, extract and write the created user spreadsheet to a folder on the web server. Return the filename to allow extraction by code or by the user from the server.
public static String SaveTemplateSpreadsheetToServer()
{
// Open the workbook.
var templatename = HostingEnvironment.MapPath("~/Files/MyTemplate.xlsx");
var workbook = Factory.GetWorkbook(templatename );
// Read and write to the spreadsheet
// Save a copy to disk and return filename
var filename = "The_exported_file.xlsx";
var filePath = HostingEnvironment.MapPath("~/FilesTemp/" + filename);
workbook.SaveAs(filePath, FileFormat.OpenXMLWorkbook);
// close workbook
workbook.Close();
// Return the filename
return fileName;
}
Method 2: Store the template on a webserver, extract and save modified spreadsheet as a byte array. Download directly an attachment
public static byte[] SaveTemplateSpreadsheetToServer()
{
// Open the workbook.
var templatename = HostingEnvironment.MapPath("~/Files/MyTemplate.xlsx");
var workbook = Factory.GetWorkbook(templatename );
// Read and write to the spreadsheet
// Save as byte array and send to user
var byteArray = workbook.SaveToMemory(FileFormat.OpenXMLWorkbook);
// close workbook
workbook.Close();
// Return the byte array
return byteArray;
}
We have done some work with binary template files saved in a database but find it more convenient to work with physical template files on a web server. It is easier to manage changes to the template.
My only caution is to avoid working with very big templates that have lots of "junk" in them (e.g. images). The process becomes affected by the time it takes to load the file into memory prior to the read / write / export activity. Less than 1MB is ideal and less than 2MB is manageable.
I am trying to export data from my C# code to MS Excel 2007, but it is taking 30 seconds to insert data in an excel file.The code is like this->
Excel.Application excelapp = new Excel.Application();
Excel.Workbook excelworkbook = excelapp.Workbooks.Open(fileTest);
Excel.Sheets excelsheets = excelworkbook.Worksheets;
Excel.Worksheet mysheets = (Excel.Worksheet)excelsheets.get_Item("Sheet1");
Excel.Range mycells = mysheets.Cells;
mycells.Item[destroyer, "A"].Value = s[2];
mycells.Item[destroyer, "B"].Value = s[1];
mycells.Item[destroyer, "C"].Value = s[3];
mycells.Item[destroyer, "D"].Value = dbl_standard.Text;
mycells.Item[destroyer, "E"].Value = s[4];
mycells.Item[destroyer, "F"].Value = s[7];
mycells.Item[destroyer, "G"].Value = s[5];
mycells.Item[destroyer, "H"].Value = s[6];
excelworkbook.Save();
excelworkbook.Close();
excelapp.Quit();
Marshal.ReleaseComObject(mycells);
Marshal.ReleaseComObject(mysheets);
Marshal.ReleaseComObject(excelsheets);
Marshal.ReleaseComObject(excelworkbook);
Marshal.ReleaseComObject(excelapp);
I am inserting hardly 25 columns.Which thing am I doing wrong?How to make it fast?
Thanks in Advance
You have two issues going on here. The first issue is that Excel interop actually opens Excel.exe and iteroperates with the process. You won't be able to remove the overhead of starting Excel, which is probably the bulk of your processing time.
The other part is that for every cell you edit you create a lot of calls "under the hood" to the interop layer. You can vectorize these calls.
For reading:
https://stackoverflow.com/a/42604291/3387223
For writing (VB example):
https://stackoverflow.com/a/23503305/3387223
That way you only create one interop operation for the whole range of values. This will be roughly 25 times quicker than inserting 25 values.
But as I stated above, starting Excel is probably what takes most of your time.
You can read and write Excel sheets faster with OpenXML, but maybe you'll run into some formatting issues, and you won't get instant updates of other formulas in your Excel sheet (if that's what you need).
https://msdn.microsoft.com/en-us/us-en/library/office/bb448854.aspx
Here's an example on generating Excel sheets:
https://msdn.microsoft.com/en-us/library/office/hh180830(v=office.14).aspx
And if you want an easier time dealing with OpenXml there is ClosedXml:
https://github.com/closedxml/closedxml
Which will make OpenXml about as easy as standard interop.
I am writing an application to open an Excel sheet and read it
MyApp = new Excel.Application();
MyBook = MyApp.Workbooks.Open(filename);
MySheet = (Excel.Worksheet)MyBook.Sheets[1]; // Explict cast is not required here
lastRow = MySheet.Cells.SpecialCells(Excel.XlCellType.xlCellTypeLastCell).Row;
MyApp.Visible = false;
It takes about 6-7 seconds for this to take place, is this normal with interop Excel?
Also is there a quicker way to Read an Excel than this?
string[] xx = new string[lastRow];
for (int index = 1; index <= lastRow; index++)
{
int maxCol = endCol - startCol;
for (int j = 1; j <= maxCol; j++)
{
try
{
xx[index - 1] += (MySheet.Cells[index, j] as Excel.Range).Value2.ToString();
}
catch
{
}
if (j != maxCol) xx[index - 1] += "|";
}
}
MyApp.Quit();
System.Runtime.InteropServices.Marshal.ReleaseComObject(MySheet);
System.Runtime.InteropServices.Marshal.ReleaseComObject(MyBook);
System.Runtime.InteropServices.Marshal.ReleaseComObject(MyApp);
Appending to the answer of #RvdK - yes COM interop is slow.
Why is it slow?
It is due to the fact how it works. Every call made from .NET must be marshaled to local COM proxy from there it must be marshaled from one process (your app) to the COM server (Excel) (through IPC inside Windows kernel) then it gets translated (dispatched) from the server's local proxy into a native code where arguments get marshaled from OLE Automation compatible types into native types, their validity checked and the function is performed. Result of the function travels back approximately same way through several layers between 2 different processes.
So each and every command is quite expensive to execute, the more of them you do the slower the whole process is. You can find lots of documentation all around the web as COM is old and well working standard (somehow dying with Visual Basic 6).
One example of such article is here: http://www.codeproject.com/Articles/990/Understanding-Classic-COM-Interoperability-With-NE
Is there a quicker way to read?
ClosedXML can both read and write Excel xlsx files (even formulas, formatting and stuff) using Microsoft's OpenXml SDK, see here: https://closedxml.codeplex.com/wikipage?title=Finding%20and%20extracting%20the%20data&referringTitle=Documentation
Excel data reader claims to be able to read both legacy and new Excel data files, I did not try it myself, take a look here: https://exceldatareader.codeplex.com/
another way to read data faster is to use Excel automation to translate sheet into a data file that you can understand easily and batch process without the interop layer (e.g. XML,CSV). This answer shows how to do it
Short answer: correct, interop is slow. (had the same problem, taking couple of seconds to read 300 lines...
Use a library for this:
http://epplus.codeplex.com/
http://npoi.codeplex.com/
This answer is only about the second part of your question.
Your are using lots of ranges there which is not as intended and indeed very slow.
First read the complete range and then iterate over the result like so:
var xx[,] = (MySheet.Cells["A1", "XX100"] as Excel.Range).Value2;
for (int i=0;i<xx.getLength(0);i++)
{
for (int j=0;j<xx.getLength(1);j++)
{
Console.WriteLine(xx[i,j].toString());
}
}
This will be much faster!
You can use this free library, xls & xlsx supported,
Workbook wb = new Workbook();
wb.LoadFromFile(ofd.FileName);
https://freenetexcel.codeplex.com/
I have some tabular data that I'd like to turn into an Excel table.
Software available:
.NET 4 (C#)
Excel 2010 (using the Excel API is OK)
I prefer not to use any 3rd party libraries
Information about the data:
A couple million rows
5 columns, all strings (very simple and regular table structure)
In my script I'm currently using a nested List data structure but I can change that
Performance of the script is not critical
Searching online gives many results, and I'm confused whether I should use OleDb, ADO RecordSets, or something else. Some of these technologies seem like overkill for my scenario, and some seem like they might be obsolete.
What is the very simplest way to do this?
Edit: this is a one-time script I intend to run from my attended desktop.
Avoid using COM interop at all costs. Use a third-party API. Really. In fact, if you're doing this server-side, you virtually have to. There are plenty of free options. I highly recommend using EPPlus, but there are also enterprise-level solutions available. I've used EPPlus a fair amount, and it works great. Unlike interop, it allows you to generate Excel files without requiring Excel to be installed on the machine, which means you also don't have to worry about COM objects sticking around as background processes. Even with proper object disposal, the Excel processes don't always end.
http://epplus.codeplex.com/releases/view/42439
I know you said you want to avoid third-party libraries, but they really are the way to go. Microsoft does not recommend automating Office. It's really not meant to be automated anyway.
http://support.microsoft.com/kb/257757
However, you may want to reconsider inserting "a couple million rows" into a single spreadsheet.
Honoring your request to avoid 3rd party tools and using COM objects, here's how I'd do it.
Add reference to project: Com object
Microsoft Excel 11.0.
Top of module add:
using Microsoft.Office.Interop.Excel;
Add event logic like this:
private void DoThatExcelThing()
{
ApplicationClass myExcel;
try
{
myExcel = GetObject(,"Excel.Application")
}
catch (Exception ex)
{
myExcel = New ApplicationClass()
}
myExcel.Visible = true;
Workbook wb1 = myExcel.Workbooks.Add("");
Worksheet ws1 = (Worksheet)wb1.Worksheets[1];
//Read the connection string from App.Config
string strConn = System.Configuration.ConfigurationManager.ConnectionStrings["NewConnString"].ConnectionString;
//Open a connection to the database
SqlConnection myConn = new SqlConnection();
myConn.ConnectionString = strConn;
myConn.Open();
//Establish the query
SqlCommand myCmd = new SqlCommand("select * from employees", myConn);
SqlDataReader myRdr = myCmd.ExecuteReader();
//Read the data and put into the spreadsheet.
int j = 3;
while (myRdr.Read())
{
for (int i=0 ; i < myRdr.FieldCount; i++)
{
ws1.Cells[j, i+1] = myRdr[i].ToString();
}
j++;
}
//Populate the column names
for (int i = 0; i < myRdr.FieldCount ; i++)
{
ws1.Cells[2, i+1] = myRdr.GetName(i);
}
myRdr.Close();
myConn.Close();
//Add some formatting
Range rng1 = ws1.get_Range("A1", "H1");
rng1.Font.Bold = true;
rng1.Font.ColorIndex = 3;
rng1.HorizontalAlignment = XlHAlign.xlHAlignCenter;
Range rng2 = ws1.get_Range("A2", "H50");
rng2.WrapText = false;
rng2.EntireColumn.AutoFit();
//Add a header row
ws1.get_Range("A1", "H1").EntireRow.Insert(XlInsertShiftDirection.xlShiftDown, Missing.Value);
ws1.Cells[1, 1] = "Employee Contact List";
Range rng3 = ws1.get_Range("A1", "H1");
rng3.Merge(Missing.Value);
rng3.Font.Size = 16;
rng3.Font.ColorIndex = 3;
rng3.Font.Underline = true;
rng3.Font.Bold = true;
rng3.VerticalAlignment = XlVAlign.xlVAlignCenter;
//Save and close
string strFileName = String.Format("Employees{0}.xlsx", DateTime.Now.ToString("HHmmss"));
System.IO.File.Delete(strFileName);
wb1.SaveAs(strFileName, XlFileFormat.xlWorkbookDefault, Missing.Value, Missing.Value, Missing.Value, Missing.Value,
XlSaveAsAccessMode.xlExclusive, Missing.Value, false, Missing.Value, Missing.Value, Missing.Value);
myExcel.Quit();
}
Some things for your consideration...
If this is a client side solution, there is nothing wrong with using Interops.
If this is a server side solution, Don't use Interops. Good alternative is OpenXML SDK from Microsoft if you don't want 3rd party solution. It's free. I believe the latest one has similar object model that Excel has. It's a lot faster, A LOT, in generating the workbook vs going the interops way which can bog down your server.
I once read that the easiest way to create an Excel table was to actualy write a HTML table, including its structure and data, and simply name the file .xls.
Excel will be able to convert it, but it will display a warning saying that the content does not match the extension.
I agree that a 3rd party dll would be cleaner than the com, but if you go the interop route...
Hands down the best way to populate an excel sheet is to first put the data in a 2 dimensional string array, then get an excel range object with the same dimensions and set it (range.set_value2(oarray) I think). Using any other method is hideously slow.
Also be sure you use the appropriate cleanup code in your finally block.
i implemented "export to Excel" with the ms-access-ole-db-driver that can also read and write excel files the follwoing way:
preparation (done once)
create an excel file that contains all (header, Formatting, formulas, diagrams) with an empty data area as a template to be filled
give the data area (including the headers) a name (ie "MyData")
Implementing export
copy template file to destination folder
open an oledb-database connection to the destination file
use sql to insert data
Example
Excel table with Named area "MyData"
Name, FamilyName, Birthday
open System.Data.OleDb.OleDbConnection
execute sql "Insert into MyData(Name, FamilyName, Birthday) values(...)"
I used this connection string
private const string FORMAT_EXCEL_CONNECT =
// #"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties=""Excel 8.0;HDR={1}""";
#"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=""Excel 12.0;HDR={1}""";
private static string GetExcelConnectionString(string excelFilePath, bool header)
{
return string.Format(FORMAT_EXCEL_CONNECT,
excelFilePath,
(header) ? "Yes" : "No"
);
}
I want to read excel file but in this way is too slow. What pattern should I use to read excel file faster. Should I try csv ?
I am using the following code:
ApplicationClass excelApp = excelApp = new ApplicationClass();
Workbook myWorkBook = excelApp.Workbooks.Open(#"C:\Users\OWNER\Desktop\Employees.xlsx");
Worksheet mySheet = (Worksheet)myWorkBook.Sheets["Sheet1"];
for (int row = 1; row <= mySheet.UsedRange.Rows.Count; row++)
{
for (int col = 1; col <= mySheet.UsedRange.Columns.Count; col++)
{
Range dataRange = (Range)mySheet.Cells[row, col];
Console.Write(String.Format(dataRange.Value2.ToString() + " "));
}
Console.WriteLine();
}
excelApp.Quit();
The reason your program is slow is because you are using Excel to open your Excel files. Whenever you are doing anything with the file you have to do a COM+ interop, which is extremely slow, as you have to pass memory across two different processes.
Microsoft has dropped support for reading .xlsx files using Excel interop. They released the OpenXML library specifically for this reason.
I suggest you use a wrapper library for using OpenXML, since the API is pretty hairy. You can check out this SO for how to use it correctly.
open xml reading from excel file
You're accessing Excel file through excel interop. By doing reads cell by cell you're doing a lot of P/Invoke's which is not very performant.
You can read data in ranges, not cell by cell. This loads the data into memory and you could iterate it much faster. (Eg. try to load column by column.)
BTW: You could use some library instead like http://epplus.codeplex.com which reads excel files directly.
Excel Data Reader
Lightweight and very fast if reading is your only concern.