Reading data from an Excel document stored in Sharepoint? - c#

How can I 'read' an excel 2003 document stored as a sharepoint spfile? I can retrieve the document from the library with no problems using the SPFile.OpenBinary() and then putting that into a MemoryStream.
The original idea was to use OpenXML to interrogate the document (which will take this object type as a constructor), but the Excel version (2003) prohibits this.
Just to cloud the issue further, there is no guarantee that I will have any Excel version on the host machine, so possibly won't be able to use the interop assemblies either.
Suggestions or solutions will be gratefully received.
When I say read, I mean pull data from named ranges, cell references etc. All of the open source libraries I have found (Exceldatareader, NOPI, OpenXML) have some limitation or another that prohibits their use. e.g. can't load macro enabled sheets
The excel document is loaded into a sharepoint library which exposes this list as a collection of SPFile(s). These files can be read into a MemoryStream simply enough, but most of the libraries I have tried require a filestream constructor, which means writing to the filesystem on the application server
I've not tried SpreadsheetGear, but if there's no footprint on the filesystem, then I'll take a look for sure, but this is not an option on this project. I'll update this thread with my findings...
I'm reduced to using the PIA's. Dirty, dirty, dirty.

SpreadsheetGear for .NET can open a xls and xlsx workbooks from a memory stream with SpreadsheetGear.Factory.GetWorkbookSet().Workbooks.OpenFromStream(System.IO.Stream) and also has the ability to open directly from a byte array with OpenFromMemory(byte[]). Once opened, SpreadheetGear has a comprehensive API, calculation engine, rendering engine and more.
You can see live samples here and download the free trial here.
Disclaimer: I own SpreadsheetGear LLC

I've found this [library] (http://exceldatareader.codeplex.com/) on codeplex which seems to be able to read any Excel version. There might be a lot more on the web

When you say read what exactly do you mean? There seems to be some great debate amongst developers as to what the term's definition is. Either way it shouldn't really matter if Excel is on their system or not, on account of I am only pretty sure that if the person wanted to view the file any way they would need at the very least a reader. So that being said I believe your fear is a moot point and that using a MemoryStream should suffice.

Related

Reading external styles.xml file and apply to our workbook

I'm gathering that if we can read external styles.xml file and apply to our workbook using OpenXML would be more efficiently than styling in our code. I have done a lot of research but still not get any solution.
Please tell me if it possible or not and give me some resources (if any) about this.
Yes, it's possible to modify styles.xml directly but that doesn't mean it's a good idea. I've seen a few people try to work directly with the styles but I have yet to see it turn out well -- and these are people with many years of experience with XML, coding and Excel is general, who figured they had exhausted usability of Excel's built-in features and figured they could do a better job at it than Microsoft can.
If you are confident that you can build a stylesheet more appropriate for your project (than what Excel's developer provided) you'll be able to find full specifications by researching ISO 29500-1.
Overview from Wikipedia
ISO/IEC 29500-1:2016 (7000+ pages)
Generating Excel 2010 Workbooks by using the Open XML SDK 2.0
How to: Manipulate Office Open XML Formats Documents
How to Diagnose Excel file corruption and Repair Workbooks
[Solved] File corrupt and therefore cannot be repaired
*I'll admit I'm very curious about how Excel's stylesheet is no longer suitable for your organization's requirements!

creating xls file in web server

I am creating an xls document using Microsoft.Office.Interop.Excel but it's not working in IIS7.
I am getting the error message below
Microsoft Office Excel cannot open or save any more documents because
there is not enough available memory or disk space. • To make more
memory available, close workbooks or programs you no longer need. • To
free disk space, delete files you no longer need from the disk you are
saving to.
Is there any other free tool or option that we can use to create xls in web server?
First of all it is not a good idea to use office interop API on a server as described in this link.
For free tools you could look at ExcelLibrary and Epplus that seems to be popular but I never used it. The other option would be to use the OOXML SDK 2.5. With this you can create only Xlsx files and has a learning curve.
For more info have look a this topic that, although it is closed, contains probably all the info you need.

OLE DB vs OPEN XML SDK vs Excel.interop

I need to read XLSX files and extract a maximum amount of content from it. Which of the API's should I use?
OLE DB, open XML SDK, or Excel Interop?
Which is the easiest to use?
Can you retrieve all the information using one or the other? i.e, date, times, merged cells, tables, pivottables, etc.
You can try all of them and choose the one that fits you most...
Depending on data you want to read, I'd suggest you to use Open XML over Interop or Ole DB.
I don't know an open XML SDK, although I've some experience with EPPlus library which I'm using a lot and can say only good words about it - it's fast, easy to learn, with good examples. The library is basing on Open Office XML format, so I suppose it's pretty much the same as the SDK you've mentioned, and is capable of easy read and write Excel 2007 and 2010 files.
On the linked web, you'll find a library itself, documentation and some example "Hello World" projects to download.
Why that library in the first place? Because with it you will be able to read not only cells values, but also their colors, fonts, widths and heights, merging and all that detailed stuff, that you can not only read, but modify as well. What's more, you don't need the Excel installed to do that.
On the second place - just in case you need to extract tabular data from worksheet - you may play with OLE DB. I'm afraid with that you won't be able to extract any info about formats, colors etc., as well as the data must be in a tabular organized worksheet, so you can treat is as a database's table.
The last one is Interop, because:
- it's a COM library, so you need to be very careful when playing with it via .NET, as it's easy to cause some ugly and hard to find memory leaks (confirmed by myself bad experience) - if you don't dispose their objects properly, it leaves the Excel.exe process opened,
- it's much slower than previous methods,
- basically, it has almost no more added value that one of the previous methods (EPPlus or OleDB) and requires Excel to be installed on client's machine, so why to use it?
Good luck, then.

Reading Word Documents stored in Oracle DB as a BLOB object using C#

We store a word document in an Oracle 10g database as a BLOB object. I want to read the contents (the text) of this word document, make some changes, and write the text alone to a different field in a C# code.
How do I do this in C# 2.0?
The easiest logic that I came up with is this -
Read the BLOB object
Store it in the FileSystem
Extract the text contents
Do your job
Write the text into a separate field.
I can use Word.dll but not any commercial solutions such as Aspose
I assume that you already know how to do steps 1 and 2 (use the Oracle.DataAccess and System.IO namespaces).
For step 3 and 5, use Word Automation. This MS support article shows you how to get started: How to automate Microsoft Word to create a new document by using Visual C#
If you know what version of Word it will be, then I'd suggest using early binding, otherwise use late binding. More details and sample code here: Using early binding and late binding in Automation
Edit: If you don't know how to use BLOBs from C#, take a look here: How to: Read and Write BLOB Data to a Database Table Through an Anonymous PL/SQL Block
This keeps coming up in my searches, so I'll add an answer for the benefit of future readers.
I highly recommend avoiding Word automation. It's painfully slow and subjects you to the whims of Microsoft's developers with each upgrade. Instead, process the files manually yourselves if you can. The files are nothing but zipped archives of XML files and resources (such as images embedded in the document).
In this case, you'd simply unzip the docx using your preferred library, manipulate the XML, and then zip the result back up.
This does require the use of docx files rather than doc files, but as the link above explains, this has been the default Word format since Office 2007 and shouldn't present an issue unless your users are desperately clinging to the past.
For an example of the time savings, Back in 2007 we converted one process that took 45 minutes using Word automation and, on the same hardware, it took 15 SECONDS processing the files manually. To be clear, I'm not blaming Microsoft for this - their Word automation methods don't know how you will manipulate the document, so they have to anticipate and track everything that you could possibly change. You, on the other hand, can write your method with laser focus because you know exactly what you want to do.

How can I programmatically create, read, write an excel without having office installed?

I'm confused as hell with all the bazillion ways to read/write/create excel files. VSTO, OLEDB, etc, but they all seem to have the requirement that office must be installed.
Here is my situation: I need to develop an app which will take an excel file as input, do some calculations and create a new excel file which will basically be a modification of the first excel file. All with the constraint that the machine that runs this may not have office installed. (Don't ask why...)
I need to support all excel formats. The only saving grace is that the formats spreadsheets themselves are really simple. Just a bunch of columns and values, nothing fancy. And unfortunately no CSV as the end user might not even know what a CSV file is.
write your excel in HTML table format:
<html>
<body>
<table>
<tr>
<td style="background-color:#acc3ff">Cell1</td>
<td style="font-weight:bold">Cell2</td>
</tr>
</table>
</body>
</html>
and give your file an xls extension. Excel will convert it automatically
Without Office installed you'll need something designed to understand the Excel binary file format (unless you only want to open Office 2007 .xlsx files).
The best I've found (and that I use) is SpreadsheetGear, which in addition to being .NET native, is much faster and more stable then the COM/OLE solutions (which I've used in the past)
read and write csv files instead. Excel reads them just fine and they're easier to use. If you need to work against .xls files then try having support for OpenOffice as well as Excel. OpenOffice can read and write excel files.
Did you consider way number bazillion and one: using the Open XML SDK? You can retain styles and tweak it to your liking. Anything you can do in an actual file is possible to achieve programatically. The SDK comes with a tool called Document Reflector that shows the underlying XML and even shows LINQ statements that can be used to generate them. That is key to playing around with it, seeing how the changes are made, then recreating that in code.
The only caveat is this will work for the new XML based formats (*.xlsx) not the older versions. There's a slight learning curve but more material is making its way on blogs and other sites.
If cost is not an issue, I'd suggest looking in Aspose's Excel product. I use their Word product and I've been satisfied.
Aspose.Cells
Excel XLSX files "just" XML files - more precisely ZIP files containing several XML files. Just rename a Excel file Test.xslx to Test.zip and open it with your favourit ZIP program. XML schemas are, afaik, standardized and availiable. But I think it might not be that easy to manipulate them only using primitive XML processiing tools and frameworks.
Excel files are in a proprietary format so (afaik) you're not going to be able to do this without having the office interop available. Some third party tools exist (which presumably licence the format from MS?) but I've not used them myself to comment on their usefulness.
I assume that you can't control the base file format, i.e. simple CSV or XML formats aren't going to be possible?
I used to use a very nice library called CarlosAg, which uses Excel XML format. It was great (and Excel recognizes the format), and also incredibly fast. Check it out here.
Oh, as a side note, we used to use this for the very same reason you need it. The servers that generated these files were not able to have Excel installed.
If you cannot work with CSV files as per #RHicke's suggestion, and assuming you are working on a web app, since a desktop app would be guaranteed to have XL installed as per requirements.
I'd say, create your processing app as a webservice, and build an XL addin which will interact with your webservice directly from XL.
For XLSX files, look at using http://www.codeplex.com/ExcelPackage. Otherwise, some paid 3rd party solutions are out there, like the one David suggested.
I can understand the requirement of not having office installed on a server machine.
There are many libraries like aspose being available, some of them requiring license though.
If you are targeting MS Excel formats, then a native, Interoperability library, ACE OLEDB data provider, from Microsoft is available which you can install on a machine and start reading, writing programmatically. You need to define a connection string and commands as per you needs. (Ref: This article #yoursandmyideas)talks about using this library along with setup and troubleshooting information.

Categories

Resources