Need to compare two excel files - c#

I have to compare two Excel files having different number of sheets, different number of columns/rows, merged cells, formulas etc.
The output of the comparison should allow the user to see which rows/columns were added, deleted, modified, re-arranged. This whole thing has to be done using C#,.NET
Can anyone help me out?

You need to define your scope better. As #Brijeesh suggested, there are 3rd party libraries like EPPLus, and also NPOI for reading the files. But this is not about reading the files, but rather performing some "comparison" that you have an idea in your head how you want it to happen.
This question is too broad to be answered in a single thread like this.
Are the spreadsheets in a known format?
Do you only want to check certain sheets / columns ?
You mentioned wanting to see which columns were re-arranged, this implies knowing what order they were in the first place. Again - is the spreadsheet in a known format?
You should start to answer these questions, and trim down the scope of what you're trying to do into manageable tasks.
EDIT:
VSTO has its benefits, and it's downfalls (the user must have excel installed). 3rd party libraries also have their ups and downs. Which method you should use for accessing the spreadsheets was not the question in this thread.
Again - focus on one thing at a time. What do you want to compare, and how do you want to do it? Then you can worry about which library you should use. Though - when you're ready to ask that question, open a separate question so that the posts can stay on topic.

have a looka at epplus if that doesnt fits your requirement use VSTO/office interop

I created a spreadsheet macro that compares two spreadsheets together and highlights differences. You have to login to download it but you can use your Google login.
The VBA code is not locked and it might give you a start to port it to C#
http://www.run8tech.com/tools.aspx

Related

Is there a way to expose custom data from c# application so that Excel can not just show, but also modify the data? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Are there any options implementing any kind of service that can be consumed by Excel, and user can modify/add data via Excel, and such changes will be communicated back to service?
I read some articles about OData but it seems to be a one way export in excel, no data modification are supported.
There are several ways to expose data from C# to Excel so that you can edit the information in Excel and pass it into the C# world (... and beyond, e.g. to a database server, to an entirely different system, etc etc).
One way -- which happens to be an inherent Excel feature -- is to build an RTD server that wraps a C# dll. You could use a tool like Excel-DNA which will take care of the complexities of writing the RTD wrapper. One nice feature of RTD is that it's quite asynchronous, which can be a performance booster at times. One pain point of RTD is that the asynchronous-ness is managed on a cell-by-cell basis, which means that if the cells relate to each other in any way (e.g. one cell contains the part number, another contains quantity, and yet another the price, and you need all three to get anything done) then you have to write some clever logic to accomodate this. Not impossible, but definitely a pain point. Usually what we would do is create a very hidden sheet that contains the RTD formulas to watch over a different sheet for changes, and handle the changes accordingly.
Another approach would be to write what's called UDF's. Similar challenges to RTD, but you get to have the reference to the entire worksheet rather than one cell at a time, so the pain point is lessened a bit. UDF's are not asynchronous out of the box, but you can definitely make them behave asynchronously. This is pretty important - if you have a long running operation inside the UDF function, and you don't background thread it, Excel will freeze until you complete the function. This could lead to users thinking you broke Excel (LOL). Note this is not an issue with RTD's, since they are already asynchronous. There's several ways you could create a UDF, one of them is again Excel-DNA, but you should explore your options before going headlong into that route. The actual implmentation of handling change is similar to RTD - create a very hidden page, scan the target worksheet.
There is yet another approach, which would be to create a VSTO add-in. This basically integrates your C# dll into Excel. You can create toolbar elements, intercept Excel events -- essentially you can interact with Excel on a very low level. Pretty much anything you can do in Excel VBA, you can do with VSTO, and more. Handling change here is done at the add-in side (i.e. your code-behind .cs files) and the world is your oyster at that point. I usually write a cell changed event handler, but you can capture changes at pretty much any point that you want. You can even defer handling the data change until the user presses the save button - I used to do this, because it was a pretty straightforward paradigm for the end-user.
Those would be the 3 options that come to the top of my mind.
You might take a look at the Microsoft.Office.Interop.Excel namespace.
Here is a good explanation what one could do with it.
Try the Excel-DNA library http://exceldna.codeplex.com/
Better avoid Microsoft.Office.Interop libraries. They are very difficult to handle disposing of references and they require office (in this case Excel) to be installed in your machine (and even worse app servers etc.)

ASP.NET web application I Can’t Access after IIS hosting [duplicate]

A client wants to "Web-enable" a spreadsheet calculation -- the user to specify the values of certain cells, then show them the resulting values in other cells.
(They do NOT want to show the user a "spreadsheet-like" interface. This is not a UI question.)
They have a huge spreadsheet with lots of calculations over many, many sheets. But, in the end, only two things matter -- (1) you put numbers in a couple cells on one sheet, and (2) you get corresponding numbers off a couple cells in another sheet. The rest of it is a black box.
I want to present a UI to the user to enter the numbers they want, then I'd like to programatically open the Excel file, set the numbers, tell it to re-calc, and read the result out.
Is this possible/advisable? Is there a commercial component that makes this easier? Are their pitfalls I'm not considering?
(I know I can use Office Automation to do this, but I know it's not recommended to do that server-side, since it tries to run in the context of a user, etc.)
A lot of people are saying I need to recreate the formulas in code. However, this would be staggeringly complex.
It is possible, but not advisable (and officially unsupported).
You can interact with Excel through COM or the .NET Primary Interop Assemblies, but this is meant to be a client-side process.
On the server side, no display or desktop is available and any unexpected dialog boxes (for example) will make your web app hang – your app will behave flaky.
Also, attaching an Excel process to each request isn't exactly a low-resource approach.
Working out the black box and re-implementing it in a proper programming language is clearly the better (as in "more reliable and faster") option.
Related reading: KB257757: Considerations for server-side Automation of Office
You definitely don't want to be using interop on the server side, it's bad enough using it as a kludge on the client side.
I can see two options:
Figure out the spreadsheet logic. This may benefit you in the long term by making the business logic a known quantity, and in the short term you may find that there are actually bugs in the spreadsheet (I have encountered tons of monster spreadsheets used for years that turn out to have simple bugs in them - everyone just assumed the answers must be right)
Evaluate SpreadSheetGear.NET, which is basically a replacement for interop that does it all without Excel (it replicates a huge chunk of Excel's non-visual logic and IO in .NET)
Although this is certainly possible using ASP.NET, it's very inadvisable. It's un-scalable and prone to concurrency errors.
Your best bet is to analyze the spreadsheet calculations and duplicate them. Now, granted, your business is not going to like the time it takes to do this, but it will (presumably) give them a more usable system.
Alternatively, you can simply serve up the spreadsheet to users from your website, in which case you do almost nothing.
Edit: If your stakeholders really insist on using Excel server-side, I suggest you take a good hard look at Excel Services as #John Saunders suggests. It may not get you everything you want, but it'll get you quite a bit, and should solve some of the issues you'll end up with trying to do it server-side with ASP.NET.
That's not to say that it's a panacea; your mileage will certainly vary. And Sharepoint isn't exactly cheap to buy or maintain. In fact, short-term costs could easily be dwarfed by long-term costs if you go the Sharepoint route--but it might the best option to fit a requirement.
I still suggest you push back in favor of coding all of your logic in a separate .NET module. That way you can use it both server-side and client-side. Excel can easily pass calculations to a COM object, and you can very easily publish your .NET library as COM objects. In the end, you'd have a much more maintainable and usable architecture.
Neglecting the discussion whether it makes sense to manipulate an excel sheet on the server-side, one way to perform this would probably look like adopting the
Microsoft.Office.Interop.Excel.dll
Using this library, you can tell Excel to open a Spreadsheet, change and read the contents from .NET. I have used the library in a WinForm application, and I guess that it can also be used from ASP.NET.
Still, consider the concurrency problems already mentioned... However, if the sheet is accessed unfrequently, why not...
The simplest way to do this might be to:
Upload the Excel workbook to Google Docs -- this is very clean, in my experience
Use the Google Spreadsheets Data API to update the data and return the numbers.
Here's a link to get you started on this, if you want to go that direction:
http://code.google.com/apis/spreadsheets/overview.html
Let me be more adamant than others have been: do not use Excel server-side. It is intended to be used as a desktop application, meaning it is not intended to be used from random different threads, possibly multiple threads at a time. You're better off writing your own spreadsheet than trying to use Excel (or any other Office desktop product) form a server.
This is one of the reasons that Excel Services exists. A quick search on MSDN turned up this link: http://blogs.msdn.com/excel/archive/category/11361.aspx. That's a category list, so contains a list of blog posts on the subject. See also Microsoft.Office.Excel.Server.WebServices Namespace.
It sounds like you're talking that the user has the spreadsheet open on their local system, and you want a web site to manipulate that local spreadsheet?
If that's the case, you can't really do that. Even Office automation won't help, unless you want to require them to upload the sheet to the server and download a new altered version.
What you can do is create a web service to do the calculations and add some vba or vsto code to the Excel sheet to talk to that service.

Excel as datasource?

I'm facing a huge challenge with some excel sheets which was written in 90's for some financial calculations. Basically there are around 100 different excel sheets(different in structure, content, formulas,macros etc) and this needs to be integrated into one single application and thus decommission all those excels.
Some of the features of the excel sheet:
Every excel sheet has some defined number of sheets with formulas and macros associated with it. The macros are written in VBA and makes use of extensive addin's in the form of XLL's.
Every excel sheet also has some user defined work sheets, which rely upon the user input data and the predefined sheets to compute instrument prices.The most important thing is that the user inputs data to the user created sheets, which is referenced in the pre defined sheets.The predefined sheets calculates something based on this, and the result is again referenced in the user defined sheets.
We were thinking of various approaches to decommision this as below:
Use Excel Services with Sharepoint 2010, expose the excel sheet as a web service and write a client application which will do the user defined part and then communicate with the excel services for the result.(it will also have to write to the predefined sheets hosted in excel services)
Use excel automation(??)
My question to you is what would be the best feasible approach here to decommission those excels?.All I have to do is to write to excel cells, let the formulas/macros play there part to compute the result, get back the results.We are also very keen not to re create the whole excel set up i.e re create the formula's/addins which is in place, but rather consolidation of them.
Any advice would be much appreciated!!!
Cheers,
-Mike
If we ignore particular technologies, you have three basic options although combinations are possible:
Keep the existing Excel workbooks in the background and provide a new front-end.
Migrate the VBA to some new tool.
Identify with your users, the functionality they need today and tomorrow and implement that.
I would be surprised if Option 1 did not give you the cheapest and quickest integration. The downside is that you remain dependent on those old Excel workbooks. Are they based on early versions of Excel or have they been migrated forward? If not, do they work with the latest hardware and operating systems? Will they work with tomorrow's hardware and operating systems? Microsoft makes enormous efforts to maintain backward compatibility but that does not mean they achieve 100% backward compatibility nor that they can or will maintain backward compatibility in future.
Is Option 2 feasible? Do you know what each workbook does? The VBA can be migrated/converted to a new tool but what about the Add-ins? Do you have the source? Do you have the documentation? The trouble with relying on the documentation is that some bugs become undocumented features that people use.
With Option 3 you need not worry about bringing anything forward. You pick the right tool for today and tomorrow and use it to implement the functionality you need. You have to be slightly careful. If you ask users if they need facility X, they will say yes regardless. The question to ask is: what do you use facility X for?
Each of these options has risks? Do you have the information to access the risks? Can you quantity the work and risks associated with each of these options? Only when you can will you be able to select an approach and then a technology.
I am not familiar with Sharepoint, so can't compare, but with Excel automation it is really possible to open a sheet, enter some values there and read results. It even can be done visually as if somebody opens Excel on the screen and does something in it. I would use some modern .Net language for this if your old sheets are compatible.

Probability of copy/pasting from datagridview to excel breaking in future excel/.net versions?

If I built an application that expects the user to use the copy-paste functionality from
a datagridview to excel (as a simple export), do I run a big risk that this will no longer work/give strange behavior in future builds of .net/versions of excel?
Per this MSDN article on copying data from a DataGridView, the control will set onto the clipboard both tab-delimited string data and an HTML table representing the data that was copied. These being relatively standard formats, as well as a relatively standard practice for exchanging data, you should be safe.
Of course, nobody can say with certainty what Microsoft will do in future versions of their products, so there's always a chance that something will break later on. However, Microsoft's pretty good at making things sufficiently backwards-compatible.
There is always a risk relying on clipboard for an export implementation:
I wouldn't say it is big but you are effectively asking for trouble when exporting this way - depending on what's installed on the system (for example some background clipboard watching app) or different versions of Excel or usage of your application in a Terminal Server situation etc. - the list of things that can go wrong is "endless" IMHO.
Using some mechanism to write a real file (XLS or XLSX) is really the way to go.
There are several free and commercial libraries (which don't require Excel to be present) out there to write Excel files (some can even export the content of a DataGridView with less than 10 lines of code!):
OpenXML from MS (free)
ClosedXML (free)
EPPlus (free)
SpreadsheetGear (commercial)
Aspose.Cells (commercial)
LibXL (commercial)
Flexcel (commercial)
You can even export in "Excel-HTML format" (sample source and MS documentation).
Yes, in a way you always run that risk when you assume certain functionality of third party program to stay around. However, Microsoft has a long history of keeping their software backward compatible for a long while, so it may not be in the next year or so that this feature will suddenly be removed.
If you must remain compatible, use another method for exchanging data.

ASP.NET MVC / C# - How to Convert/Create Excel (xls) with formatting from DataTable/List in MVC?

I was able to create Excel xls file from DataTable following this link. But how do I format the contents of the XLS file?
Thanks.
I have personally used NPOI with good results.
Using this technique you cannot format the contents. You will need to generate a real Excel binary file using some third party control for this purpose. Here's one worth checking out. You may also take a look at this article.
The solution in that link simply creates a tab-separated value file, which can only contain the cell data, not the formatting information. If you need the formatting then you either save HTML markup with an extension of .xls (not an approach I'd normally recommend), or you nede to create a "genuine" Excel file... but I don't know what library options you have in C# to do this (other than using .net
However, there's a number of libraries listed in the responses to this previous question
If you prefer to avoid 3rd party libraries, you have two other reasonable alternatives:
Personally I'm a fan of SpreadsheetML (one of the formats supported by Excel 2003+), since it's relatively easy to just manually generate one from scratch. Just create a file that looks right and save to SpreadsheetML format, then use that as a guideline when generating your own.
An alternative is to start with a preformated "clean" Excel document (with table headings but no data). You can then insert rows of data into it using the Jet.OLEDB provider.
This has numerous disadvantages over using a 3rd party library:
1. Cannot format the data you are inserting. Only allows formatting of the table headings etc.
2. Compatibility issues on some versions of Windows.
3. Handles some types of data poorly.
It has a few advantages over using a 3rd party library:
1. From Microsoft. Fewer bugs, free, less risk of malicious code.
2. Very simple to use, assuming you're already comfortable working with SQL.

Categories

Resources