I have some tabular data that I'd like to turn into an Excel table.
Software available:
.NET 4 (C#)
Excel 2010 (using the Excel API is OK)
I prefer not to use any 3rd party libraries
Information about the data:
A couple million rows
5 columns, all strings (very simple and regular table structure)
In my script I'm currently using a nested List data structure but I can change that
Performance of the script is not critical
Searching online gives many results, and I'm confused whether I should use OleDb, ADO RecordSets, or something else. Some of these technologies seem like overkill for my scenario, and some seem like they might be obsolete.
What is the very simplest way to do this?
Edit: this is a one-time script I intend to run from my attended desktop.
Avoid using COM interop at all costs. Use a third-party API. Really. In fact, if you're doing this server-side, you virtually have to. There are plenty of free options. I highly recommend using EPPlus, but there are also enterprise-level solutions available. I've used EPPlus a fair amount, and it works great. Unlike interop, it allows you to generate Excel files without requiring Excel to be installed on the machine, which means you also don't have to worry about COM objects sticking around as background processes. Even with proper object disposal, the Excel processes don't always end.
http://epplus.codeplex.com/releases/view/42439
I know you said you want to avoid third-party libraries, but they really are the way to go. Microsoft does not recommend automating Office. It's really not meant to be automated anyway.
http://support.microsoft.com/kb/257757
However, you may want to reconsider inserting "a couple million rows" into a single spreadsheet.
Honoring your request to avoid 3rd party tools and using COM objects, here's how I'd do it.
Add reference to project: Com object
Microsoft Excel 11.0.
Top of module add:
using Microsoft.Office.Interop.Excel;
Add event logic like this:
private void DoThatExcelThing()
{
ApplicationClass myExcel;
try
{
myExcel = GetObject(,"Excel.Application")
}
catch (Exception ex)
{
myExcel = New ApplicationClass()
}
myExcel.Visible = true;
Workbook wb1 = myExcel.Workbooks.Add("");
Worksheet ws1 = (Worksheet)wb1.Worksheets[1];
//Read the connection string from App.Config
string strConn = System.Configuration.ConfigurationManager.ConnectionStrings["NewConnString"].ConnectionString;
//Open a connection to the database
SqlConnection myConn = new SqlConnection();
myConn.ConnectionString = strConn;
myConn.Open();
//Establish the query
SqlCommand myCmd = new SqlCommand("select * from employees", myConn);
SqlDataReader myRdr = myCmd.ExecuteReader();
//Read the data and put into the spreadsheet.
int j = 3;
while (myRdr.Read())
{
for (int i=0 ; i < myRdr.FieldCount; i++)
{
ws1.Cells[j, i+1] = myRdr[i].ToString();
}
j++;
}
//Populate the column names
for (int i = 0; i < myRdr.FieldCount ; i++)
{
ws1.Cells[2, i+1] = myRdr.GetName(i);
}
myRdr.Close();
myConn.Close();
//Add some formatting
Range rng1 = ws1.get_Range("A1", "H1");
rng1.Font.Bold = true;
rng1.Font.ColorIndex = 3;
rng1.HorizontalAlignment = XlHAlign.xlHAlignCenter;
Range rng2 = ws1.get_Range("A2", "H50");
rng2.WrapText = false;
rng2.EntireColumn.AutoFit();
//Add a header row
ws1.get_Range("A1", "H1").EntireRow.Insert(XlInsertShiftDirection.xlShiftDown, Missing.Value);
ws1.Cells[1, 1] = "Employee Contact List";
Range rng3 = ws1.get_Range("A1", "H1");
rng3.Merge(Missing.Value);
rng3.Font.Size = 16;
rng3.Font.ColorIndex = 3;
rng3.Font.Underline = true;
rng3.Font.Bold = true;
rng3.VerticalAlignment = XlVAlign.xlVAlignCenter;
//Save and close
string strFileName = String.Format("Employees{0}.xlsx", DateTime.Now.ToString("HHmmss"));
System.IO.File.Delete(strFileName);
wb1.SaveAs(strFileName, XlFileFormat.xlWorkbookDefault, Missing.Value, Missing.Value, Missing.Value, Missing.Value,
XlSaveAsAccessMode.xlExclusive, Missing.Value, false, Missing.Value, Missing.Value, Missing.Value);
myExcel.Quit();
}
Some things for your consideration...
If this is a client side solution, there is nothing wrong with using Interops.
If this is a server side solution, Don't use Interops. Good alternative is OpenXML SDK from Microsoft if you don't want 3rd party solution. It's free. I believe the latest one has similar object model that Excel has. It's a lot faster, A LOT, in generating the workbook vs going the interops way which can bog down your server.
I once read that the easiest way to create an Excel table was to actualy write a HTML table, including its structure and data, and simply name the file .xls.
Excel will be able to convert it, but it will display a warning saying that the content does not match the extension.
I agree that a 3rd party dll would be cleaner than the com, but if you go the interop route...
Hands down the best way to populate an excel sheet is to first put the data in a 2 dimensional string array, then get an excel range object with the same dimensions and set it (range.set_value2(oarray) I think). Using any other method is hideously slow.
Also be sure you use the appropriate cleanup code in your finally block.
i implemented "export to Excel" with the ms-access-ole-db-driver that can also read and write excel files the follwoing way:
preparation (done once)
create an excel file that contains all (header, Formatting, formulas, diagrams) with an empty data area as a template to be filled
give the data area (including the headers) a name (ie "MyData")
Implementing export
copy template file to destination folder
open an oledb-database connection to the destination file
use sql to insert data
Example
Excel table with Named area "MyData"
Name, FamilyName, Birthday
open System.Data.OleDb.OleDbConnection
execute sql "Insert into MyData(Name, FamilyName, Birthday) values(...)"
I used this connection string
private const string FORMAT_EXCEL_CONNECT =
// #"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties=""Excel 8.0;HDR={1}""";
#"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=""Excel 12.0;HDR={1}""";
private static string GetExcelConnectionString(string excelFilePath, bool header)
{
return string.Format(FORMAT_EXCEL_CONNECT,
excelFilePath,
(header) ? "Yes" : "No"
);
}
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I'm trying to save an existing Excel file via ssms OR C# into my SQL Server 2016 database.
I want to save each row of my Excel file in a C# object and then save it into my database, or do you have better ideas?
I also thought about saving the Excel file as a *.csv and import this file via ssms in my database.
Which of these two ideas would you recommend or is there any other way to solve this problem?
If you have any questions, I would be pleased to answer them.
I thank you in advance for all the answers and tips!
For your problem you can try below approaches:
1) Using SQLBulkcopy:
SqlBulkCopy class as the name suggests does the bulk insert from one source to another and hence all rows from the Excel sheet can be easily read and inserted using the SqlBulkCopy class.
protected void Upload(object sender, EventArgs e)
{
//Upload and save the file
string excelPath = Server.MapPath("~/Files/") + Path.GetFileName(FileUpload1.PostedFile.FileName);
FileUpload1.SaveAs(excelPath);
string conString = string.Empty;
string extension = Path.GetExtension(FileUpload1.PostedFile.FileName);
switch (extension)
{
case ".xls": //Excel 97-03
conString = ConfigurationManager.ConnectionStrings["Excel03ConString"].ConnectionString;
break;
case ".xlsx": //Excel 07 or higher
conString = ConfigurationManager.ConnectionStrings["Excel07+ConString"].ConnectionString;
break;
}
conString = string.Format(conString, excelPath);
using (OleDbConnection excel_con = new OleDbConnection(conString))
{
excel_con.Open();
string sheet1 = excel_con.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null).Rows[0]["TABLE_NAME"].ToString();
DataTable dtExcelData = new DataTable();
//[OPTIONAL]: It is recommended as otherwise the data will be considered as String by default.
dtExcelData.Columns.AddRange(new DataColumn[3] { new DataColumn("Id", typeof(int)),
new DataColumn("Name", typeof(string)),
new DataColumn("Salary",typeof(decimal)) });
using (OleDbDataAdapter oda = new OleDbDataAdapter("SELECT * FROM [" + sheet1 + "]", excel_con))
{
oda.Fill(dtExcelData);
}
excel_con.Close();
string consString = ConfigurationManager.ConnectionStrings["constr"].ConnectionString;
using (SqlConnection con = new SqlConnection(consString))
{
using (SqlBulkCopy sqlBulkCopy = new SqlBulkCopy(con))
{
//Set the database table name
sqlBulkCopy.DestinationTableName = "dbo.tblPersons";
//[OPTIONAL]: Map the Excel columns with that of the database table
sqlBulkCopy.ColumnMappings.Add("Id", "PersonId");
sqlBulkCopy.ColumnMappings.Add("Name", "Name");
sqlBulkCopy.ColumnMappings.Add("Salary", "Salary");
con.Open();
sqlBulkCopy.WriteToServer(dtExcelData);
con.Close();
}
}
}
}
Here this code adds an excel sheet with three columns as Id, Name and Salary.
2) Using DTS in SSMS:
You can use the SQL Server Data Transformation Services (DTS) Import Wizard or the SQL Server Import and Export Wizard to import Excel data into SQL Server tables. When you are stepping through the wizard and selecting the Excel source tables, remember that Excel object names that are appended with a dollar sign ($) represent worksheets (for example, Sheet1$), and that plain object names without the dollar sign represent Excel named ranges.
3) Using SSIS package:
You can create SSIS package to import excel file. For this, you can use BIDS in Visual Studio or SQL Server Data tools.
You can give your excel file as excel source and in the target give your SQL server database table.
Perform the necessary mappings and you're good to go.
Now, you must be having a question like When to use which approach?
Use approach 1, Whenever you're providing functionality to import excel file at the user end, i.e. according to application requirement, the user can upload local excel sheet. For this use case, one thing you should look out is, the user must be aware of the template. If you have written code to import excel with 3 columns and the user tries to import with 4 columns, you will have some error in future. So make sure that you provide a template that user should download and fill and upload it.
Use approach 2, whenever you want to load data for only one time, or you can say that you want to perform initial load. You can use this approach as it's most simple and requires less time to do the configuration.
Use approach 3, whenever you have some requirement like to import excel data on the timely basis from some shared location. For ex, you are importing monthly mobile bills to your database provided by some vendor. You can create a package for this functionality and do the SSIS configuration and create a package.
Once the package is created you can create a SQL job and schedule it as per the requirements.
You can use BulkInsert to imports a data file into a database table or view in a user-specified format in SQL Server
As all, it depends on usage, change frequency, who is going to maintain solution etc.
SSIS and CSV import
It is possible to create SSIS package which would be able to import your data automatically when deployed on MSSQL server or manually. This would be simplest/quickest to implement. One of advantages when using Visual Studio tooling for SSIS development you would have visual representation of mappings, flow.
Drawback, even though I have seen automated column mapping updates (C# automatic SSIS package generation), whenever you would need to add, remove, change column, you would need a manual change.
BCP
MS console utility which you can use to define columns in format files and import your CSVs. Drawback is that you there is no graphical user interface, though many would argue that this is an advantage because there is a better overview for changes.
ORM
In object relational mapping solution you would need to translate your Excel file into object oriented programming language classes and save as objects into database table. Drawback is that you need to have some programming knowledge, but would pay off in a longer run because potentially your solution could get the data directly form source for those excel sheets.
I have reached a point of confusion when I was researching this topic...
I have read this fixed-length text file and write it to a disconnected recordset in C#. I'm following something like this example:
public static Recordset CreateDisconnectedRecordset()
{
var rs = new Recordset(); // Create new recordset
// Add some updatable fields
rs.Fields.Append(
"name",
DataTypeEnum.adVarChar,
20,
FieldAttributeEnum.adFldUpdatable,
Missing.Value);
rs.Fields.Append(
"country",
DataTypeEnum.adVarChar,
20,
FieldAttributeEnum.adFldUpdatable,
Missing.Value);
rs.Open(
Missing.Value,
Missing.Value,
CursorTypeEnum.adOpenUnspecified,
LockTypeEnum.adLockUnspecified,
0);
// Add data
rs.AddNew(Missing.Value, Missing.Value);
rs.Fields["name"].Value = "Anders";
rs.Fields["country"].Value = "Sweden";
rs.Update(Missing.Value, Missing.Value);
return rs;
}
While using ADODB 2.7 and System.Reflection.
I would load a file from using the open file dialog and have it pass through here. The fields parts would have the values replaced with the string that contains each line that the streamreader has read from the file when I open it.
I also tried splitting up the creation and opening/writing process into two classes (opening the recordset would be derived.) But this is just outright unfriendly to what I do...
I also figured that there would be no need for a connection because it's reading from a text file, not CSV or a database file. Every other example I see seems to require a connection to somewhere, why I don't know...
Is there any way to actually write into a recordset with just a TXT file? Or do I have to do a few other things to actually write into it?
And if anyone asks: This is a file that came from a legacy system that was moved to text format. I can't use a more modern approach.
I am writing an application to open an Excel sheet and read it
MyApp = new Excel.Application();
MyBook = MyApp.Workbooks.Open(filename);
MySheet = (Excel.Worksheet)MyBook.Sheets[1]; // Explict cast is not required here
lastRow = MySheet.Cells.SpecialCells(Excel.XlCellType.xlCellTypeLastCell).Row;
MyApp.Visible = false;
It takes about 6-7 seconds for this to take place, is this normal with interop Excel?
Also is there a quicker way to Read an Excel than this?
string[] xx = new string[lastRow];
for (int index = 1; index <= lastRow; index++)
{
int maxCol = endCol - startCol;
for (int j = 1; j <= maxCol; j++)
{
try
{
xx[index - 1] += (MySheet.Cells[index, j] as Excel.Range).Value2.ToString();
}
catch
{
}
if (j != maxCol) xx[index - 1] += "|";
}
}
MyApp.Quit();
System.Runtime.InteropServices.Marshal.ReleaseComObject(MySheet);
System.Runtime.InteropServices.Marshal.ReleaseComObject(MyBook);
System.Runtime.InteropServices.Marshal.ReleaseComObject(MyApp);
Appending to the answer of #RvdK - yes COM interop is slow.
Why is it slow?
It is due to the fact how it works. Every call made from .NET must be marshaled to local COM proxy from there it must be marshaled from one process (your app) to the COM server (Excel) (through IPC inside Windows kernel) then it gets translated (dispatched) from the server's local proxy into a native code where arguments get marshaled from OLE Automation compatible types into native types, their validity checked and the function is performed. Result of the function travels back approximately same way through several layers between 2 different processes.
So each and every command is quite expensive to execute, the more of them you do the slower the whole process is. You can find lots of documentation all around the web as COM is old and well working standard (somehow dying with Visual Basic 6).
One example of such article is here: http://www.codeproject.com/Articles/990/Understanding-Classic-COM-Interoperability-With-NE
Is there a quicker way to read?
ClosedXML can both read and write Excel xlsx files (even formulas, formatting and stuff) using Microsoft's OpenXml SDK, see here: https://closedxml.codeplex.com/wikipage?title=Finding%20and%20extracting%20the%20data&referringTitle=Documentation
Excel data reader claims to be able to read both legacy and new Excel data files, I did not try it myself, take a look here: https://exceldatareader.codeplex.com/
another way to read data faster is to use Excel automation to translate sheet into a data file that you can understand easily and batch process without the interop layer (e.g. XML,CSV). This answer shows how to do it
Short answer: correct, interop is slow. (had the same problem, taking couple of seconds to read 300 lines...
Use a library for this:
http://epplus.codeplex.com/
http://npoi.codeplex.com/
This answer is only about the second part of your question.
Your are using lots of ranges there which is not as intended and indeed very slow.
First read the complete range and then iterate over the result like so:
var xx[,] = (MySheet.Cells["A1", "XX100"] as Excel.Range).Value2;
for (int i=0;i<xx.getLength(0);i++)
{
for (int j=0;j<xx.getLength(1);j++)
{
Console.WriteLine(xx[i,j].toString());
}
}
This will be much faster!
You can use this free library, xls & xlsx supported,
Workbook wb = new Workbook();
wb.LoadFromFile(ofd.FileName);
https://freenetexcel.codeplex.com/
I have a requirement to generate output report in Excel format and open the same on the screen when the processing is complete. But in this case, it should not save the report on the drive anywhere and only open on the screen.
I tried to use ADO using OLEDB but it always generates file before writing anything to it.
This is what I have tried so far.
using (OleDbConnection con = new OleDbConnection(connString))
{
try
{
con.Open();
}
catch (InvalidOperationException invalidEx)
{
//Exception handling
}
// Create table for excel structure
StringBuilder strSQL = new StringBuilder();
strSQL.Append("CREATE TABLE [" + tableName + "]([TITLE] text,[SURNAME] text,[STATUS] text)");
// Define file columns
StringBuilder strfield = new StringBuilder();
strfield.Append("[TITLE],[SURNAME],[STATUS]");
OleDbCommand cmd = new OleDbCommand(strSQL.ToString(), con);
cmd.ExecuteNonQuery(); // This creates the table
//Actual row for creating and insering row - logic not shown completely
cmd.CommandText = strSQL.Append(" insert into [" + tableName + "]( ")
.Append(strfield.ToString())
.Append(") values (").Append(strvalue).Append(")").ToString();
success = cmd.ExecuteNonQuery();
But this always creates the file first which I do not want.
Please advise if anyone has worked on the similar requirement. Thanks.
Ok, first off use ADO (a database access technology) to try and create a spreadsheet is bizarre, possibly doable, but definitely not easy.
Secondly you're saying create a spreadsheet and open it, without creating a file, this means that you'll also have to create ALL the functionality to open, parse, format and display spreadsheets (basically recreate Excel!)...as Excel cannot do this for you.
So I would question the "generate output report in Excel format" requirement, does this really mean "display in a grid"? Or is it "display in a grid that allows formatting, totalling?"
If it the Excel format really is a requirement, then the only thing I can suggest is you will have to create a temporary Excel file, then delete it after you've displayed it.
I would look at the ClosedXML library that really simplifies the use of OpenXML to create xlsx spreadsheets.
Perhaps this Microsoft Article will help: How to: Open a spreadsheet document from a stream (Open XML SDK)
I am attempting to export rows of data from sql to excel but my Insert Command seems to fail every time. I have spent a good deal of time trying to create this but I have finally run up against the wall.
The excel document is one that is generated by the IRS and we are not aloud to modify anything above row 16. Row 16 is the header row, and everything below that needs to be the data from sql. The header names all have spaces in them, and that seems to be where I am running into trouble.
Starting at row 16 the column names are:
Attendee First Name, Attendee Last Name, Attendee PTIN, Program Number, CE Hours Awarded Program, Completion Date
This is how I am attempting to write to excel
private void GenerateReport()
{
FileInfo xlsFileInfo = new FileInfo(Server.MapPath(CE_REPORTS_PATH + CE_PTIN_TEMPLATE + EXTENSION));
string connectionString = String.Format(#"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties='Excel 8.0;HDR=Yes'", xlsFileInfo.FullName);
//create connection
OleDbConnection oleDBConnection = new OleDbConnection(connectionString);
oleDBConnection.Open();
//create the adapter with the select to get
OleDbDataAdapter adapter = new OleDbDataAdapter("SELECT * FROM [Sheet1$A16:F16]", oleDBConnection);
// Create the dataset and fill it by using the adapter.
DataTable dataTable = new DataTable();
adapter.FillSchema(dataTable, SchemaType.Source);
adapter.Fill(dataTable);
string[] colNames = new string[dataTable.Columns.Count];
string[] colParms = new string[dataTable.Columns.Count];
for (int i = 0; i < dataTable.Columns.Count; i++)
{
colNames[i] = String.Format("[{0}]", dataTable.Columns[i].ColumnName);
colParms[i] = "?";
}
// Create Insert Command
adapter.InsertCommand = new OleDbCommand(String.Format("INSERT INTO [Sheet1$] ({0}) values ({1})", string.Join(",", colNames), string.Join(",", colParms)), oleDBConnection);
// Create Paramaters
for (int i = 0; i < dataTable.Columns.Count; i++)
{
OleDbParameter param = new OleDbParameter(String.Format("#[{0}]", dataTable.Columns[i].ColumnName), OleDbType.Char, 255, dataTable.Columns[i].ColumnName);
adapter.InsertCommand.Parameters.Add(param);
}
// create a new row
DataRow newCERecord = dataTable.NewRow();
// populate row with test data
for (int i = 0; i < dataTable.Columns.Count; i++)
{
newCERecord[i] = "new Data";
}
dataTable.Rows.Add(newCERecord);
// Call update on the adapter to save all the changes to the dataset
adapter.Update(dataTable);
oleDBConnection.Close();
}
The error I get happens when adapter.Update(dataTable) is called and is as follows
$exception {"The INSERT INTO statement contains the following unknown field name: 'Attendee First Name'. Make sure you have typed the name correctly, and try the operation again."} System.Exception {System.Data.OleDb.OleDbException}
This is frustrating because I pull each field directly from the column name as gotten by colNames[i] = String.Format("[{0}]", dataTable.Columns[i].ColumnName). I discovered I needed the [] to account for the spaces in the column names, but at this point I am not sure what the problem is. When I look at the excel file everything seems correct to me.
I actually found a Microsoft article for you that has the entire code done - you can likely copy & paste whichever solution you like most. Here's the link:
http://support.microsoft.com/kb/306023
It seems like the one with CopyRecordset is your easiest approach, although they do explain the one I mentioned (using a tab-delimited file).
Edit: Here's my original answer for the sake of completeness. See the link above instead for more details and for a possible better solution.
This is not an answer to your question but a suggestion to change your approach (if you can). Excel tends to be very slow when adding data through COM calls and I assume OleDB uses COM internally. In my experience the fastest (and coincidentally the least painful way) to output data to Excel was to generate a tab-separated text file with all the data and then just import the file into Excel and use COM interop to perform any formatting on the sheet. When I generated Excel reports this way, most of my reports used to be generated almost 100x faster than using the Excel COM object model. (I don't know if this would be the same for OleDB calls, since I've never used OleDB with Excel but I'd be willing to bet the OleDB adapter uses COM internally.)
This would also take care of your embedded space problem since tab would be the column separator.
In your particular situation, I'd import the text file into Excel into a new sheet and then copy & paste it into the IRS sheet, at the right location. When done, the temporary sheet can be deleted.