OpenXML removing #REF from xml file in excel C# - c#

Issue:. Corrupt named ranges can to accumulate in the workbook XML and slow the workbook down, and cause error messages when worksheets are copied between workbooks. A workbook can only contain one range with a given name, so if two worksheets containing identical range names come together, it will produce an error.
Once way named ranges can become useless/corrupted is en the cells to which they refer are deleted They can be identified from the #REF error which appears:
My goal is to remove any #REF in the xml file within my excel file. I plan on using DocumentFormats.OpenXML. Currently I'm making a copy of my old excel file I then want to be able to access the XML within the new copy of the excel file.
System.IO.File.Copy(txbBrowse.Text, Path.Combine(Path.GetDirectoryName(txbBrowse.Text), Path.GetFileNameWithoutExtension(txbBrowse.Text) + "Repair" + DateTime.Now.ToString("yyyy-MM-dd-HH-mm-ss") + Path.GetExtension(txbBrowse.Text)));
I'm currently using WinFroms in C#
I can't seem to find any links on how I would achieve my goal. Any ideas on how I would go about opening and editing my XML file to remove all the #REF in my new copy of the excel file? If someone could give me some advice on this matter or refer me to a few links I would appreciate it.

Related

EPPlus Excel Exporting hides some tabs

I am using C# with EPPlus library.
The task is that I am opening an excel file, removing some sheets from it and saving it as a separate excel file.
The issue I am facing is that the excel sheet tabs at the bottom are moved out of view in the output file saved:
The activesheet is sheet 1, but the Sheet 1 tab is not visible on the excel file.
Is there any command using EPPlus which can amend this to look like:
Files saved by Excel can contain information about the width of the tab area. I am quite sure that EPPlus does not take care of this detail.
If you rename a .xlsx file to .zip and take the xl\workbook.xml file of it you will find a section that may contain an attribute 'tabRatio'. The attribute is present only if the width is adjusted manually. This attribute or the whole section may be deleted.
I did a test: I saved a file with too little space for the tabs and after deletion of the section the tab area looked fine again.
Maybe EPPlus generates the 'tabRatio' attribute but with useless values? Please check for it. If so, you might want to file a request for the developers to leave it out if it does not make sense.

How do I use a relative path to point to a file from which I wish to import data to my Data Model?

What I'm really trying to achieve
We have an Excel dashboard which is built to be used in tandem with an Export from our in-house application. The generation of the data Excel file is done with EPPlus and we were convinced that we could use the Excel Dashboard as the "base" file for the generation, so that the exported file contained the dashboard in one worksheet and the dataset in another worksheet.
So the user would receive everything they needed in one single file.
We sort of got going with this thanks to named ranges which dynamically adapt to the size of the dataset, covered here
However, we've found, that since the DataModel in the Excel dashboard file is an OLAP Cube, we can't update the underlying dataset and save it with EPPlus and receive an error about "the cache source is not a worksheet" when trying to save the sheet.
So while we worked on that, we sought out a temporary solution... where we've also hit a snag.
The new problem
The temporary solution we thought of is to distribute the Excel dashboard and the data as two separate Excel files. The Dashboard is distributed to those who need it and the data exports can then be generated from our application.
We thought that the only downside of this would be that it would require the user to rename and place the data file side by side with the Excel Dashboard manually.
However, we are running into an issue where Excel is insisting on using the absolute path to the data file instead of a relative path.
This results in requiring the user to point the source to the data export manually. Which is, apparently, done this way:
Now, I'm showing the entire process to also give some insight into the way we've set it up, since I'm not sure I'm using the right words about the technical details - and perhaps I'm grossly mistaken in the approach.
Overall, the way it works
File A contains the dashboard, some sheets with pivot tables used by the dashboard. The pivot tables are all working off of the aforementioned Data Model in the sheet which is a Cube (we need it to be a cube for some of the functions we use in the pivot tables). The Data Model is based on a named range including all the data in one of the worksheets.
File B is the one generated by our application. The two files were placed side by side when designing the dashboard and mapping the data since I was of the impression that Excel attempted to keep relative file paths when possible. The file consists of a single worksheet containing the exported data.
At this point, the challenge was to automatically pull in data from File B into the worksheet in File A. I did this by going to the Data tab and using the Get Data function to point it to File B and telling Excel to load the data from the specified sheet.
This worked like a charm although the "solution" seems overly complicated.
All was well until we tried to use sheet from another machine/directory. Then we found that the path to File B seemed to be absolute and that the data file could no longer be found.
So, the short question after this long-winded explanation is: "How can I possibly have Excel use a relative path to another file, when importing data from an external file with the "Get Data" function?"
By getting more familiar with Power Query I've been able to come up with a solution to this.
My setup is still as described above.
First, I have a sheet with some "system" values I use various places in the Workbook. I added a field there with the following Excel formula:
=LEFT(CELL("filename");FIND("[";CELL("filename");1)-1)
This provides me with the absolute path to the folder where my File A resides.
I use this value, and concatenate the expected filename of File B which holds the data. The result is an absolute path, pointing to where I expect the data file to be located.
I then added a name range pointing to the exact cell where this value is contained.
Next, I added a new Power Query function like this:
= (rangeName) => Excel.CurrentWorkbook(){[Name=rangeName]}[Content]{0}[Column1]
The function takes the name of a named range as a parameter and spits back the value. I my case I called the function GetValue. This function can now be used in other Power Query scripts.
Finally, I loaded up the Power Query scripts which is responsible for loading the data from the other Excel sheet. In that script, I changed the path of the file to this:
Source = Excel.Workbook(File.Contents(GetValue("FilePath")), null, true),
The single thing to notice here is the call of GetValue("FilePath") for the path parameter of File.Contents. FilePath is the name I gave the range pointing to the cell. All it does, is load the path from my sheet and use that as the path for the Excel sheet holding the data.
A quite convoluted solution, but it works.

Modifying Excel Sheet through C# without Interop.Excel

I know questions like this are around in stack and there are 3rd part libraries to do the trick but none of them is fixing my issue at the moment. So the issue.
I have an Excel workbook (.xlsx) with multiple sheets generated by another system. I have to read the data from this via SSIS and dump it to a SQL DB.
Now the issue is although the Excel sheet contains data and when I open manually it opens without any error and the data displays when I use a script task and use OLEDB connection to connect to the excel and open it up the connection is made successfully but when reading data the column names are not picked (I get F1, F2 likewise) and no data rows are read. I simply get a blank row and that's about it. I have tried with HDR= YES and NO and IMEX=1 and 0 but always the result is same.
Funny thing is if I open the excel sheet do some modification (like change a sheet name save and change back the sheet name and save and close) and after that I try to run the package the data gets picked without any issue (also I noticed that the file size increases from 164KB to 196KB). Now because of this what am trying to do is modify the the file a bit and save via code.
So the initial step I tried was through using Office.Interop.Excel and it works like a charm in my machine but on the server NO OFFICE so IT NO WORKS. And nope the IT guys are never going to install access engine or excel or anything there.
Then I tried via OpenXML and 3rd party library like NPOI and even via OLEDB connection to modify the file. in both NPOI and OLEDB methods the file got changed but still it didn't get picked up properly by the SSIS package (I noticed that the file size didn't change and remained at 164kb). In OpenXML it wasn't able to open the file and threw an error saying "the document cannot be opened because there is and invalid part with an unexpected content type".
So right now I am stuck with no proper method in sight and would appreciate any help in solving this either through c# code or any other SSIS method available. SSIS version am using is 2008.
Edit 1
So I noticed that the script task is able to read the data from the first sheet out f the multiple sheets but the other sheets are the problem. So somewhere the xml for these sheets are broken. Anyway I can copy the xml configs of the first sheet to other ones? Just a thought...
Edit 2
So the first sheet is of ContentType "application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml" while all the other sheets are of ContentType "application/xml"
Ultimately ended up using two libraries for this. The data was read without an issue by using exceldatareader (http://exceldatareader.codeplex.com/). Using this the data was read into a dataset easily and then it was written to a new Excel file using epplus (http://epplus.codeplex.com/).
After that when the new excel file was read via the SSIS package data got picked without an issue. Hope this will help someone out there.

How to read a range of cells (e.g.A1:G30) from Excel file to GridView

I have been looking for a solution all over the last days and I found that this library EPPlus allows retrieving in the same time formatting besides the actual data, plus charts, if needed, from Excel files which is what I am aiming at the moment.
Could you please explain to me step by step how to read a Range of cells from an Excel (like A1:P34) file that resides at a certain path, via ASP.NET/C#?
PATH would be something like //ServerName/Folder1/Folder2/Folder3/ExcelFileName.xlsx
I looked over the web, but there is not explicit documentation for my level of C# expertise on this. I tried several examples but none displayed the Excel Range into the webpage. (e.g this one.)
Note: the three examples I have tried all included an File Upload Control, I do not need such. I want to read the Excel file from a specified location over the local network.
EPPlus library is available here.
If you can recommend me any simpler resources to understand EPPlus on:
-reading from Excel
-writing from Excel
-reading charts from Excel
This EPPlus does seem wonderful in its functionality.
To read a file off the server take a look at this:
Open ExcelPackage Object with Excel application without saving it on local file path
Just need to set the var path part for your file.
To actually put the excel data on a web page, that is not so easy. See this:
Generating a HTML table from an Excel file using EPPlus?
Response to Comment:
Hosting an actual excel sheet in web page is temperamental at best but there are ways to do it (I haven't tried it personally). SharePoint is probably your best option if you have it available. If not, you would have to use an iFrame or some kind of office web component. Check this out:
how to display excel sheet in html page

EXcel VSTO - Transferring a list object onto a worksheet

I have a VSTO add in I am looking to implement.
I would like to click a button and a list of products, names, etc would be placed onto the worksheet.
I understand that I could go through each individual item in the list and then write this cell by cell, but is there a way of literally just 'dumping' the data onto the worksheet?
Apologies if this is a really thick question.
Nope, there is no easy way to just 'dump' the data. You're going to have to do it the hard way. Just google for some examples, it's easy enough.
http://www.google.nl/search?q=c%23+export+data+to+excel
That depends on where your "list of products, names, etc" are coming from. If those items are fixed, you can create a template document with a prepared worksheet containing these items. Put this document into an embedded resource of your program. When you want to create a new worksheet from this template, extract your resource to a temporary file and use the Excel API to copy the worksheet from your template to your working document.
Sounds perhaps more complicated than it is. Here
Read a file from a resource and write it to disk in C#
you find an example how to accomplish the "extract file from resource to temporary file" part.

Categories

Resources