I need to work with Excel 2007 File for reading the data. for that which one is the best way to do that:
Using OLEDB Provider
Excel Interop Object
Dump the Excel data to Database and Using Procedure
kindly guide me to choose.
Here are my opinions:
1. Using OLEDB Provider
will only suit your needs if you have simple, uniform structured tables. It won't help you much, for example, if you have to extract any cell formatting information. The Jet engine's buggy "row type guessing" algorithm may make this approach almost unusable. But if the data type can be uniquely identified from the first few rows of each table, this approach may be enough. Pro: it is fast, and it works even on machines where MS Excel is not installed.
2. Excel Interop Object
may be very slow, especially compared to option 1, and you need MS Excel to be installed. But you get complete access to Excel's object model, you can extract almost every information (for example: formatting information, colors, frames etc) that is stored in your Excel file and your sheets can be as complex structured as you want.
3. Dump the Excel data to Database and Using Procedure
depends on what kind of database dump you have in mind, and if you have a database system at hand. If you are thinking of MS access, this will internally use the Jet engine again, with the same pros and cons as approach 1 above.
Other options:
4. write an Excel VBA macro to read the data you need and write it to a text file. Read the text file from a C# program. Pro: much faster than approach 2, with the same flexibility in accessing meta information. Con: you have to split your program in a VBA part and a C# part. And you need MS Excel on your machine.
5. Use a third party library / component for this task. There are plenty of libraries for the job, free and commercial ones. Just ask Google, or search here on SO. Lots of those libs don't require MS Excel on the machine, and they are typically the best option if you are going to extract the data as part of a server process.
Options 1 and 2 are almost always an exercise in pain, no matter how you ask the question.
If you can use SSIS to move the data into a database, and if that suits your needs because of other requirements, that's also a good option.
But the preferred option is usually to use Office Open XML for Excel 2007 and later. That has none of the COM headaches you get with Option 2, and none of the issues you have with guessing row types as you have with Option 1.
With a more carefully crafted question, you can get a far better answer, though.
Related
I have this situation, maybe it is too basic for this site, but I still hope to get some suggestions:
There are 3 different systems that I need to collect data from, all 3 on different servers in a local network. One of them is based on MySQL database which I have complete access to, the second one is based on MS Access database, and the third one has a flat file database and its data can only be accessed through txt exports from application
I need to collect data into independent database and create excel and pdf reports
I don't need charts, nicely formatted excel table should be just fine
Data is updated each hour, so it should be collected and the report should be produced every hour
Any suggestions about how to integrate the data, which dbms is best to use for this purpose?
What is the best option for creating excel and pdf reports without having to buy any software?
I hope to get some guidelines, thank you.
Well something to look into for Excel is the Microsoft Interop libraries. It's free and integrates directly into Office tools.
http://msdn.microsoft.com/en-us/library/microsoft.office.interop.excel.aspx
As far as your database situation, is there any problem with just coding the server calls and statements into different classes, then setting it on a one hour timer to refresh and collect new data?
I know there are a thousand data import related questions on Stack Overflow, and please accept my apologies if this has already been asked somewhere, but I wondered if anybody could advise of any tried and tested solutions for normalisaing data during an import from CSV/Excel in C#/ASP.NET MVC 3+.
I could code something to do the job but wondered if there were any open source libraries or tools which could help out with this.
My area of interest is as follows:
When importing data, I occasionally need to normalise some fields, a simplistic example of this is shown below:
My input may be:
Name, JobTitle
==============
Nick, Manager
Dan, Coder
My table structure may be
Name, JobTitleId
================
Nick, 1
Dan, 2
If a job title doesn't exist, I'd like it to be created in my JobTitles table and the Id to be returned. If it does exist, I'd like to store the ID.
In sumamry, my questions are:
Is there a technique or approach I should be taking to do this?
Are there any open source/commercial libraries out there which handle this - no point in re-inventing the wheel
Is there a technique or approach I should be taking to do this?
The simplest technique that I can think of is a non-MVC way: use SSIS!
You can have ASP.NET as the middle man to accept the Excel file and pass it to the database if security is a concern. There are many tutorials out there using Excel Connection Manager. e.g., Import Excel File.
SSIS gives you incredible flexibity which might help you normalize your data.
Are there any open source/commercial libraries out there which handle this - no point in re-inventing the wheel
As you might already have invested in Microsoft products like Visual Studio and SQL Server, you might as well leverage them to suit your needs than trying out open source products out there.
Sigh!!
Besides that, if you really wanna check out tools, then I have heard people recommend these ones (not used any of them):
RelationalExcel but it is not free.
ExpertXLS Excel Library for .NET - this one is not free but works with ASP.NET.
This blog seems like a good reference to tools that he has tried.
Other alternatives could be:
Load your Excel using .NET into datasets and generate your SQL queries dynamically OR pass this to an SQL Server sp that does the heavy lifting
Parse your Excel using VBA and then use it as a data source to ASP.NET
my requirement is that in nop commerce 1.9 i have to insert multiple discount form a excel sheet which have lot of data so before doing this task i need to be clear in mind which one is best solution for this.
Which is the fastest way to upload the excelsheet having more than 100,000 lines of code in C#?
i read this question and answer found that SSIS is an option .
is really SSIS is best for large size file import and export options.
and what other benefits i will get if i use SSIS packages ?
For ~100,000 rows, performance should not be a significant problem with this type of data.
SSIS can do this, but it is not the only option. I think there are three reasonable approaches to doing this:
SSIS:
This can read excel files. If your spreadsheet is well behaved (i.e. can be trusted
to be laid out correctly) then SSIS can load the contents. It has some error logging
features, but in practice it can only usefully dump a log file or write errors out to
a log table. Erroneous rows can be directed to a holding table.
Pros
Load process is fairly easy to develop.
SSIS package can be changed independently of the application if the spreadsheet format has to change.
Can read directly from spreadsheet file
Cons:
Dependency on having SSIS runtime installed on the system.
SSIS is really intended to be a server-side installation; error handling tends to consist of writing messages to logs. You would need to find a way to make error logs available to the user to troubleshoot errors.
BCP or BULK INSERT:
You can export the spreadsheet to a CSV and use BCP or a BULK INSERT statement to load the file. However, this requires the file to be exported to a CSV and copied to a drive on the database server or a share accessible to it.
Pros:
Fast
bcp can be assumed to be present on the server.
Cons:
Requires manual steps to export to CSV
The file must be placed on a volume that can be mounted on the server
Limited error handling facilities.
SqlBulkCopy API:
If you're already using .Net you can read from the spreadsheet using OLE automation or ODBC and load the data using the SQL Server Bulk Load API. This requires you to write a C# routine to do the import. If the spreadsheet is loaded manually then it can be loaded from the user's PC.
Pros
Does not require SSIS to be installed on the computer,
file can be located on user's PC
Load process can be interactive, presenting errors to the user and allowing them to correct the errors with multiple retries.
Cons:
Most effort to develop.
Really only practical as a feature on an application.
SSIS is a ETL tool. You can do transformations, error handling (as mentioned by Kumar), look-ups in the SSIS, you can redirect invalid rows, add derived columns and lot more. You can even add configuration files to it to change some of the properties/parameters ...
There are more options how to load the data to SQL.
1, SSIS - you need to design the workflow (you need BIDS or VS to design and test the package)
2, as "demas" mentioned, you can export the data to flat file and use BCP/bulk insert
3, you can use the openrowset operator on SQL (ad-hoc distributed queries must be enabled to use this functionality) Then you can just query the excel file from SQL - this could be the easy way how to read the data:
SELECT * FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0','Excel 8.0;Database=C:\test.xls', 'SELECT * FROM [Sheet1$]')
- try to look on google for OPENROWSET + EXCEL to get more examples. In this scenario you can query also text files, ACCESS ...
There are more ways how to do it, but it really depends on what you want to achieve. 100K rows is really not much in this case.
SSIS is a good solution, but if the performance is most important for you I'll try to convert excel file to plain text file and use BULK INSERT functionality.
Error logging - Invalid rows can be easily logged into a seperate table for further verification.
If you need to perform complex transformations and required value checking and the file is extremely large (100,000 rows in an Excel file is tiny), the SSIS may be the best solution. It is a very powerful, complex tool.
However, and it is a big however, SSIS is difficult to effectivly learn and work in and is hard to debug. I perform imports and exports as my full-time job and it took me well over a year to get comfortable with SSIS. (of course what I do with the data is very complicated and not at all straightforward, so take this with a grain of salt.) Now if you aren't doing any complex transformations, it isn't that bad to set up, but it is nowhere near as simple as DTS was for instance mostly because it has so much more available functionality. If you are doing simple imports with no transformations, I believe that other methods might be more effective.
As far as performance, SSIS can shine (it is built to move millions or more records to data warehouses where speed is criticial)or be a real dog depending on the way it is set up. It takes a good level of skill to get to where you can performance tune one of these packages.
My current scenario is that I have 20 Excel files that need to be populated by running multiple scripts. Currently this is a manual process. I am about to start a small project that should automate most if not all of the work.
Tools:
I am currently using Excel 2007, but could potentially be working with Excel 2010 in the near future. I am currently planing to use VS 2005 Professionl or C# Express 2010.
My approach is to create templates for the 20 files, so that formatting and data that is not changing is already saved. Through C# I plan to then insert data at appropriate cells within each "Sheet" per file.
I have come across multiple approaches that I read on (here and other sites), namely -
Excel Interop
OLE DB
ADO.NET
Open XML
I am trying to find out if someone has done something similar before and which one of the technologies would work best (if possible with some explanation or link for more information).
I am sorry if the question is too subjective and not appropriate. I will understand if someone decides to close it.
Thanks.
I might be missing something, but I would think OLE DB and ADO.Net end up being the same if you are talking C#. Basically, you would use the OLE DB provider for Excel for ADO.Net.
I recommend http://www.connectionstrings.com/ for how to setup your provider.
However, if you are going to use templates, it's pretty easy. Just setup your "tables" in your Excel files. Make sure you use named ranges. All that is left is to write some basic SQL like:
INSERT INTO customers(id,name) VALUES (?, ?)
And that is it. I have used it occasionally for data manipulation. I do use it far more often for data reading.
Regards,
Eric.
I think OLE DB is the easyest way to do this.
OLEDB is for me the best way.
On e of my current requirements is to take in an Excel spreadsheet that the user updates about once a week and be able to query that document for certain fields.
As of right now, I run through and push all the Excel (2007) data into an xml file (just once when they upload the file, then I just use the xml) that then holds all of the needed data (not all of the columns in the spreadsheet) for querying via Linq-to-XML; note that the xml file is smaller than the excel.
Now my question is, is there any performance difference between querying an XML file with Linq and an Excel file with OledbConnection? Am I just adding another unneccesary step?
I suppose the followup question would be, is it worth it for ease of use to keep pushing it to xml.
The file has about 1000 rows.
For something that is done only once per week I don't see the need to perform any optimizations. Instead you should focus on what is maintainable and understandable both for you and whoever will maintain the solution in the future.
Use whatever solution you find most natural :-)
As I understand it the performance side of things stands like this for accessing Excel data.
Fastest to Slowest
1. Custom 3rd party vendor software using C++ directly on the Excel file type.
2. OleDbConnection method using a schema file if necessary for data types, treats Excel as a flatfile db.
3. Linq 2 XML method superior method for read/write data with Excel 2007 file formats only.
4. Straight XML data manipulation using the OOXML SDK and optionally 3rd party xml libraries. Again limited to Excel 2007 file formats only.
5. Using an Object[,] array to read a region of cells (using .Value2 prop), and passing an Object[,] array back again to a region of cells (again .Value2 prop) to write data.
6. Updating and reading from cells individually using the .Cells(x,y) and .Offset(x,y) prop accessors.
You can't use a SqlConnection to access an Excel spreadsheet. More than likely, you are using an OleDbConnection or an OdbcConnection.
That being said, I would guess that using the OleDbConnection to access the Excel sheet would be faster, as you are processing the data natively, but the only way to know for the data you are using is to test it yourself, using the Stopwatch class in the System.Diagnostics namespace, or using a profiling tool.
If you have a great deal of data to process, you might also want to consider putting it in SQL Server and then querying that (depending on the ratio of queries to the time it takes to save the data, of course).
I think it's important to discuss what type of querying you are doing with the file. I have to believe it will be a great deal easier to query using LINQ than the oledbconnection although I am talking more from experience than anything else.