I have a requirement to read and write to a shared excel(xlsx) file using open xml sdk in C#.
I have updated the shared mode setting using the answer to this question on stack overflow and the setting is updated in the created excel file.
I have wrote a small program to insert data to this generated Excel file based on this.
I have tested this with 3 different users trying to write data at the same time over LAN.
I initially got an exception during open at the below statement.
using (SpreadsheetDocument doc = SpreadsheetDocument.Open(excelStream, true))
At this point of time only one user can write to the shared excel file even though the shareworkbook settings were enabled.
Later I changed the above statement to use Stream as below
using (FileStream excelStream = new FileStream(filePath,FileMode.OpenOrCreate,FileAccess.ReadWrite,FileShare.ReadWrite))
{
using (SpreadsheetDocument doc = SpreadsheetDocument.Open(excelStream, true))
After this change the frequency of the initial exception is reduced but still comes sometimes. But, when this succeeds multiple users are able to write to the excel file.
I have observed two important behavior when multiple users write to a shared excel file.
Even though the Users write the data concurrently, all the rows written by a user in one session are arranged in a sequence.
When
multiple users try to write to the same shared excel file, the open
xml seem to have writing the data user after user in a sequence.
This I have verified by inserting a timestamp while writing into the
excel file for each user. Writing for user2 starts after end of
user1.
Can anyone please guide me in finding the right approach to eliminate the exception during open and also do a concurrent write to the excel file using OpenXML Sdk.
Thanks in advance.
You probably should consider using a database if you need multi-user access and expect it to work well. Even Access would be better than Excel. If you need the output to be in Excel, a report that can export to Excel could function for that.
If an exception is occurring while opening the file, maybe try handling that exception, and adding retry logic? You can use something like, https://github.com/App-vNext/Polly to make doing that easy.
The user sequencing is probably just how it's handling writes. I wouldn't expect this to be something you can work around since it's Excel, not a database.
Related
I know questions like this are around in stack and there are 3rd part libraries to do the trick but none of them is fixing my issue at the moment. So the issue.
I have an Excel workbook (.xlsx) with multiple sheets generated by another system. I have to read the data from this via SSIS and dump it to a SQL DB.
Now the issue is although the Excel sheet contains data and when I open manually it opens without any error and the data displays when I use a script task and use OLEDB connection to connect to the excel and open it up the connection is made successfully but when reading data the column names are not picked (I get F1, F2 likewise) and no data rows are read. I simply get a blank row and that's about it. I have tried with HDR= YES and NO and IMEX=1 and 0 but always the result is same.
Funny thing is if I open the excel sheet do some modification (like change a sheet name save and change back the sheet name and save and close) and after that I try to run the package the data gets picked without any issue (also I noticed that the file size increases from 164KB to 196KB). Now because of this what am trying to do is modify the the file a bit and save via code.
So the initial step I tried was through using Office.Interop.Excel and it works like a charm in my machine but on the server NO OFFICE so IT NO WORKS. And nope the IT guys are never going to install access engine or excel or anything there.
Then I tried via OpenXML and 3rd party library like NPOI and even via OLEDB connection to modify the file. in both NPOI and OLEDB methods the file got changed but still it didn't get picked up properly by the SSIS package (I noticed that the file size didn't change and remained at 164kb). In OpenXML it wasn't able to open the file and threw an error saying "the document cannot be opened because there is and invalid part with an unexpected content type".
So right now I am stuck with no proper method in sight and would appreciate any help in solving this either through c# code or any other SSIS method available. SSIS version am using is 2008.
Edit 1
So I noticed that the script task is able to read the data from the first sheet out f the multiple sheets but the other sheets are the problem. So somewhere the xml for these sheets are broken. Anyway I can copy the xml configs of the first sheet to other ones? Just a thought...
Edit 2
So the first sheet is of ContentType "application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml" while all the other sheets are of ContentType "application/xml"
Ultimately ended up using two libraries for this. The data was read without an issue by using exceldatareader (http://exceldatareader.codeplex.com/). Using this the data was read into a dataset easily and then it was written to a new Excel file using epplus (http://epplus.codeplex.com/).
After that when the new excel file was read via the SSIS package data got picked without an issue. Hope this will help someone out there.
I have to load huge amount of data, pre-process it, share it among few users and finally gather updates back from users.
This is what I did in my previous project -
Created an excel add-in using C++. Loaded the data in memory using the add-in code and processed it. For each type of data I have sent the processed data to a sheet and saved a new excel file. That way, if I have three types of data, I have created three new excel workbooks. My users then opened those new workbooks, made their changes and dropped a text file that contains their changes (through a button). The main excel keeps polling for those updates (text files) and loads them as soon as they are found. That's the way I get the updates back from my users.
I am not a fan of what I did in my previous project, it produces too many temporary files (of course I can delete those). In my current project I want to use C# VSTO Workbook so I can have more control over excel. I was hoping once I load the data, I will ask my users to open the same excel in Read-Only mode and they will make changes. While testing this, I realized user's excel (opened in read-only) mode does not see the loaded data. And their changes do not update the data held in memory. This probably means I have no idea what I am doing.
Do you guys have any idea how to achieve this? I will really appreciate any help/hint.
Excel supports so-called "co-authoring" mode, when many people can edit the same document at the same time. But there is might be a catch: afaik, you need a Share Point/Office Online server/OneDrive Business to support this scenario (you need a non-free office document server product).
Using VSTO, you can do just the same you have done with C++ add-in, but in C# (means, the set of capabilities is 1:1 - it basically just wraps C++ COM Excel API for .NET)
But for online version of Excel, there may be yet another alternative - javascript addins (now that's called "Office Addins", afaik). But I doubt you'd want to process your "huge amounts of data" with javascript.
So I would say, there is a good rule: Don't fix something that isn't broken :)
If the problem is the number of temporary files, these files is not the only option to transfer data between applications. You know, you can connect two applications directly (so that they can exchange data with messages/updates). Use network, Luke :)
Of course if your 3 users live on 3 deserted islands, totally disconnected from anything, exchanging with text files on USB stick may still be the only viable option...
I think the "web" solution could be: store your file in some "co-authoring"-capable service (sharepoint, google shees, onedrive, officeonline, whatever). Make some web job to update that file in that storage automatically. Just like a "fourth" user would do.
Good afternoon,
we have a small problem with performance of generating excel.
First, we was creating excel cell by cell - it is ... let's say unacceptable.
Second, we started insert into excel with one command - range creating and it is much faster, but still not perfect so we are searching next solutions.
Because we can load XML file from database, we tried used XSLT and from these two files create xls file. It is nice, but after open this file there is error message shown (it is because of problem or bug in registry). User has to accept this message and after excel is opened. We want to eliminate this error message. However we don't know how.
We was thinking about convert this xls file into xlsx but we are unable to do it becouse we can't install office on server (we cannot use Interop) and OpenXML libraries don't know work with normal xls file. So my question is:
Is possible to generate from XML file with using of some XLST (or something) the xlsx file?
Eventually can what files do we need to create and zip together if we want create xlsx file?
Thank you for information
You mention not being able to use the OpenXML libraries because they don't work with .xls files, but you also say "creating cell by cell", which implies that you are generating the file from scratch. Where is the xls file coming from? You mention excel opening, but then say you can't install it on the server. So, it appears to me that a user is uploading an xls file to your server, and then you are doing something with it and giving it back to them? If that is the case and you must be able to read/write an xls file without installing office, then I would suggest using ExcelLibrary, as mentioned in this post
Indeed, creating an xlsx file is much magnitudes faster with the open xml sdk.
I was using the Microsoft.Office.Interop.Excel in C# to create a custom .xlsx file.
In doing so I created a Workbook object. Due to the nature of complex SQL queries to grab the data, process it, and apply via Interop the custom styles and formatting the code is very lengthy. Not to mention the very careful process of avoiding memory leaks from the Interop itself, and ensuring that Excel actually closes properly after running.
I originally was testing it out as a console application, and got it working to my satisfaction. What it does is save the end result to the filesystem using the SaveAs member.
However, my next goal was to instead redirect the output as an output stream to asp.net similar to this question here. I've done some rudimentary research and I cannot seem to find an approach that does not involve first saving the Workbook to the server's file system. This may cause conflicts if several users are accessing at the same time, etc.
So my question is, is there an easy way to set the asp.net ContentType for .xlsx and stream out the Workbook object without saving it to the file system? If not, is there a way asp.net can save temporary files automatically without conflicts, serve the temp file, and then delete the temp file after it's been served?
I agree with the comments that you should avoid using Excel Interop server-side, and the third party libraries I've used (EPPlus, Aspose) all support streaming the output. However, if you want to save temporary files without conflict you can use Path.GetTempFileName.
If your ASP.NET app is running under an account without a profile, you may need to give it write access to %WINDIR%\Temp or whatever temporary directory it uses.
I have huge excel files that I have to open from web browser. It takes several minutes to load huge file. Is it possible to open a single worksheet (single tab) at a time from excel file that contains many worksheets? I have to do this using C# / asp.net MVC
I'm assuming you have the excel workbook on the server and just want to send a single worksheet to the client. Does the user then edit the worksheet? Will they be uploading it back?
Assuming this is just a report then why not use the OpenXML sdk to read the workbook, extrac the sheet in question and send it back to the client? This is what #Jim in the comments was suggesting. You can get the SDK here: Open XML SDK 2.0 for Microsoft Office . However, I'm not sure if it will work with the 'old' excel format. I assume you'll need to save the template workbook in the new Office formats (xslx).
Your question is slightly unclear as to where the spreadsheet is stored.
If it's on a server you control, process it, extracting sheets you need, and create other sheets which are smaller in size. (Or possibly save them in a different format.).
If they're not on a server you control, download the file using C#, then go through a similiar process of extracting the sheet before opening it.
Having said that, I've dealt with some largish spreadsheets (20MB or so), and haven't really had a problem processing the entire spreadsheet as a whole.
So where is the bottleneck? Your network or possibly the machine you're running?
Use third party components.
We are fighting with server side Excel generation for years and has been defeated.
We bought third party components and all problems gone.
From your question, it seems you want to improve load time by using (opening) the data from one worksheet instead of the whole workbook. If this is the case and you only want the data, then access the workbook using ADO.NET with OLEDB provider. (You can use threading to load each worksheet to improve load performance. For instance, loading three large data sets in three worksheets took 17 seconds. Loading each worksheet on a separate thread, loaded same data sets in 5 seconds.)
From experience, performance starts to really suffer with workbooks of 40MB or more. Especially, if workbooks contain many formulas. My largest workbook of 120MB takes several minutes to load. Using OLEDB access, I can load, access, and process the same data in a few seconds.
If you want the client to open data in Excel, gather data via ADO.NET/OLEDB, get XML and transform into XMLSS using Xslt. Which is easy and there is much documentation and samples.
If you just want to present the data, gather data via ADO.NET/OLEDB, get XML and transform into HTML using Xslt. Which is easy and there is much documentation and samples.
Be aware that the browser and computer become non-responsive with large data sets. I had to set limit upper limit. If limit was reaced, I notified user of truncated results, otherwise, user thought computer was "locked".
Take a look at this question in StackOverflow:
Create Excel (.XLS and .XLSX) file from C#
I think you can open your workbook on the server (inside your ASP.NET MVC application) and process only the specific worksheet you want. You can then send such worksheet to the user using NPOI.
The following post shows you how to do that using an ASP.NET MVC application:
Creating Excel spreadsheets .XLS and .XLSX in C#
You can't "say" to Excel, even via Interop that you only want a single worksheet. There are a lot of explanations, like formulas, references and links between them, which makes the task impossible.
If you only want to read the data from the worksheet, maybe OLEDB Data Provider is the best option for you. Here is a full example: Reading excel file using OLEDB Data Provider
Otherwise, you will need to load the entire workbook in memory before do anything with it.