I know questions like this are around in stack and there are 3rd part libraries to do the trick but none of them is fixing my issue at the moment. So the issue.
I have an Excel workbook (.xlsx) with multiple sheets generated by another system. I have to read the data from this via SSIS and dump it to a SQL DB.
Now the issue is although the Excel sheet contains data and when I open manually it opens without any error and the data displays when I use a script task and use OLEDB connection to connect to the excel and open it up the connection is made successfully but when reading data the column names are not picked (I get F1, F2 likewise) and no data rows are read. I simply get a blank row and that's about it. I have tried with HDR= YES and NO and IMEX=1 and 0 but always the result is same.
Funny thing is if I open the excel sheet do some modification (like change a sheet name save and change back the sheet name and save and close) and after that I try to run the package the data gets picked without any issue (also I noticed that the file size increases from 164KB to 196KB). Now because of this what am trying to do is modify the the file a bit and save via code.
So the initial step I tried was through using Office.Interop.Excel and it works like a charm in my machine but on the server NO OFFICE so IT NO WORKS. And nope the IT guys are never going to install access engine or excel or anything there.
Then I tried via OpenXML and 3rd party library like NPOI and even via OLEDB connection to modify the file. in both NPOI and OLEDB methods the file got changed but still it didn't get picked up properly by the SSIS package (I noticed that the file size didn't change and remained at 164kb). In OpenXML it wasn't able to open the file and threw an error saying "the document cannot be opened because there is and invalid part with an unexpected content type".
So right now I am stuck with no proper method in sight and would appreciate any help in solving this either through c# code or any other SSIS method available. SSIS version am using is 2008.
Edit 1
So I noticed that the script task is able to read the data from the first sheet out f the multiple sheets but the other sheets are the problem. So somewhere the xml for these sheets are broken. Anyway I can copy the xml configs of the first sheet to other ones? Just a thought...
Edit 2
So the first sheet is of ContentType "application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml" while all the other sheets are of ContentType "application/xml"
Ultimately ended up using two libraries for this. The data was read without an issue by using exceldatareader (http://exceldatareader.codeplex.com/). Using this the data was read into a dataset easily and then it was written to a new Excel file using epplus (http://epplus.codeplex.com/).
After that when the new excel file was read via the SSIS package data got picked without an issue. Hope this will help someone out there.
Related
Im using C# and Service based database and I need to import some data from Excel to my database ..How could I possibly do this?? Please help. Thanks a lot.
You can open the excel file with the Excel database driver and read it like any other data source, however this means you need the driver, which isn't installed by default.
Download
HowTo
However if the sheet only contains data and doesn't need any calculations, you can unzip the XLSX file, and find sheet1.xml (or whatever it's called in your file), open it in your app like any other XML file and import the data.
This is likely to be a much better long term solution, since MS has been trying to kill off the Access database driver for ages.
Also, it's been a while, but I don't believe MS recommends using the MSDE from within a service.
I would recommend you to use OfficeOpenXml.Core.ExcelPackage or EPPlus to read/write excel files. Bellow is some links to reference
https://www.nuget.org/packages/OfficeOpenXml.Core.ExcelPackage/
https://github.com/JanKallman/EPPlus
https://www.c-sharpcorner.com/article/import-and-export-data-using-epplus-core/
https://tedgustaf.com/blog/2012/create-excel-20072010-spreadsheets-with-c-and-epplus/
http://www.talkingdotnet.com/import-export-xlsx-asp-net-core/
https://toidicodedao.com/2015/11/24/series-c-hay-ho-epplus-thu-vien-excel-ba-dao-phan-1/
http://www.zachhunter.com/2015/11/xlsx-template-with-epplus-and-web-api-save-as-xlsx-or-pdf/
my ado application requires to read data from one xlsx file(Approx 10-20MB min) and then process the data row by row and compare with another xlsx file( approx size 250 MB min) containing over 1000000 rows with 63 columns (work like database). when i try to read the database file (250 MB) and run the oledb data queries on it, it's working so strange, its only give the first matched data from the database file. but if i opened that xlsx database file in office excel and then run my application its return all matched data from the database file without changing any code.
i already checked my dataquery its working fine in server Explorer and returns the complete result.
this process also very time consuming , i already tried openxml sax mathod to resolve the performance issue as well, but its not worked too. even it takes more time compare to oledb to read the excel.
i also have two another issues in my application,
sometime oledb returns an exception 'System resource exceeded' and
some time return 'internal ole automation error' .
i also tried google to resolve these issue but i didn't find any solution for my problem.
is there any solution to resolve these issues. please help me. any suggestion appreciated but please remember one thing i can't change any xlsx table format because i don't have rights, these xlsx files automatically generated by another tools by our sourcing partners.
thanks.
Good afternoon,
we have a small problem with performance of generating excel.
First, we was creating excel cell by cell - it is ... let's say unacceptable.
Second, we started insert into excel with one command - range creating and it is much faster, but still not perfect so we are searching next solutions.
Because we can load XML file from database, we tried used XSLT and from these two files create xls file. It is nice, but after open this file there is error message shown (it is because of problem or bug in registry). User has to accept this message and after excel is opened. We want to eliminate this error message. However we don't know how.
We was thinking about convert this xls file into xlsx but we are unable to do it becouse we can't install office on server (we cannot use Interop) and OpenXML libraries don't know work with normal xls file. So my question is:
Is possible to generate from XML file with using of some XLST (or something) the xlsx file?
Eventually can what files do we need to create and zip together if we want create xlsx file?
Thank you for information
You mention not being able to use the OpenXML libraries because they don't work with .xls files, but you also say "creating cell by cell", which implies that you are generating the file from scratch. Where is the xls file coming from? You mention excel opening, but then say you can't install it on the server. So, it appears to me that a user is uploading an xls file to your server, and then you are doing something with it and giving it back to them? If that is the case and you must be able to read/write an xls file without installing office, then I would suggest using ExcelLibrary, as mentioned in this post
Indeed, creating an xlsx file is much magnitudes faster with the open xml sdk.
I have a small wpf product which requires exporting data to excel with out excel installed on the client machine.How to achieve this in C#.After exporting, this excel can be opened by Open office. All I wanted is to save excel file to the client hard disk. Even excel is not installed he should be able to save the file,he may not be read it without excel but should be able to save. I dont want to any 3rd party or some other open xmls.
Recently I downloaded a product which is able to export to excel without excel installed and able to open it with open office.
When i checked their binaries they contain office.dll ,Microsoft.Vbe.Interop.dll and Microsoft.Office.Interop.Excel.dll's only .I want to know how they are able to manage with these dlls.
I have already written code for this but its breaking when excel is not installed.
I have read many open xml and other stuff relating to this but not satisfied.
My requirement is too simple ,just exporting datatable data to excel,no reading back the data and no fancy oparations with excel.
Please give me suggestions and links will be appreciated.
Thanks in advance
Either work out with CSV format or you may like to use EPPlus library. See similar answer here
You can use CSV, XML, or ADO
How To Use ADO.NET to Retrieve and Modify Records in an Excel Workbook With Visual Basic .NET
xslt transformation can also to the job. i use it to export wpf datagrid data.
I have huge excel files that I have to open from web browser. It takes several minutes to load huge file. Is it possible to open a single worksheet (single tab) at a time from excel file that contains many worksheets? I have to do this using C# / asp.net MVC
I'm assuming you have the excel workbook on the server and just want to send a single worksheet to the client. Does the user then edit the worksheet? Will they be uploading it back?
Assuming this is just a report then why not use the OpenXML sdk to read the workbook, extrac the sheet in question and send it back to the client? This is what #Jim in the comments was suggesting. You can get the SDK here: Open XML SDK 2.0 for Microsoft Office . However, I'm not sure if it will work with the 'old' excel format. I assume you'll need to save the template workbook in the new Office formats (xslx).
Your question is slightly unclear as to where the spreadsheet is stored.
If it's on a server you control, process it, extracting sheets you need, and create other sheets which are smaller in size. (Or possibly save them in a different format.).
If they're not on a server you control, download the file using C#, then go through a similiar process of extracting the sheet before opening it.
Having said that, I've dealt with some largish spreadsheets (20MB or so), and haven't really had a problem processing the entire spreadsheet as a whole.
So where is the bottleneck? Your network or possibly the machine you're running?
Use third party components.
We are fighting with server side Excel generation for years and has been defeated.
We bought third party components and all problems gone.
From your question, it seems you want to improve load time by using (opening) the data from one worksheet instead of the whole workbook. If this is the case and you only want the data, then access the workbook using ADO.NET with OLEDB provider. (You can use threading to load each worksheet to improve load performance. For instance, loading three large data sets in three worksheets took 17 seconds. Loading each worksheet on a separate thread, loaded same data sets in 5 seconds.)
From experience, performance starts to really suffer with workbooks of 40MB or more. Especially, if workbooks contain many formulas. My largest workbook of 120MB takes several minutes to load. Using OLEDB access, I can load, access, and process the same data in a few seconds.
If you want the client to open data in Excel, gather data via ADO.NET/OLEDB, get XML and transform into XMLSS using Xslt. Which is easy and there is much documentation and samples.
If you just want to present the data, gather data via ADO.NET/OLEDB, get XML and transform into HTML using Xslt. Which is easy and there is much documentation and samples.
Be aware that the browser and computer become non-responsive with large data sets. I had to set limit upper limit. If limit was reaced, I notified user of truncated results, otherwise, user thought computer was "locked".
Take a look at this question in StackOverflow:
Create Excel (.XLS and .XLSX) file from C#
I think you can open your workbook on the server (inside your ASP.NET MVC application) and process only the specific worksheet you want. You can then send such worksheet to the user using NPOI.
The following post shows you how to do that using an ASP.NET MVC application:
Creating Excel spreadsheets .XLS and .XLSX in C#
You can't "say" to Excel, even via Interop that you only want a single worksheet. There are a lot of explanations, like formulas, references and links between them, which makes the task impossible.
If you only want to read the data from the worksheet, maybe OLEDB Data Provider is the best option for you. Here is a full example: Reading excel file using OLEDB Data Provider
Otherwise, you will need to load the entire workbook in memory before do anything with it.