MS Visio (.vsd) Binary File Format Structure Documentation - c#

Would anybody have Microsoft Visio 2003 - 2010 (.vsd) binary file structure documentation?
I am talking about the older OLE2 compound binary file format (Visio 2003 - 2010) (ex., .vsd, .vsw), not the newer Open XML file format, like .vsdx, .vstx, ...
I spent a lot of time on internet (especially Microsoft Website) trying to find it. Unfortunately, so far without success. Thank you.

Your findings are correct, binary vsd format is not publicly documented.

Related

How to generate xls & xlsx from office 2003 xml spreadsheet programatically in c#

Currently our application exporting large data to office 2003 xml spreadsheet format in server.
The user can download the xml file.The file can be easily opened in office 03 and 07 correctly.
I want to know whether it is possible to create xls and xlsx format from this xml file in server and serve them to user?
[The server doesn't have office and neither will be.So interop is useless in this case.]
====== EDIT=====
Can't use third party solutions... :(
With your XML file, you could easily use XSLT to transform your XML into CSV. However, if you really need to export in native Excel spreadsheet types, it's probably a good idea to use a component.
Something we've used in the past, that I'd recommend, is Aspose.Cells – we've used this to generate Excel spreadsheets where plain CSV output wouldn't suffice (are you sure it won't suffice in your case - XSLT will make short work of that).
Whilst it's not free, it saves having to install Office on the server (and avoid using Office Automation etc.) and has very good support.
Edited following comments:
If you can't use third party components, if you can't install Office on the server, and you can't use interop, then – unless you want to get comfortable and read the Excel Binary File Format (.xls) Structure Specification as provided by Microsoft – the answer is a very short one: no.
If I understand your question corretly, you are not just serving 'pure' XML, but SpreadsheetML that is interpreted by Excel just as an ordinary Excel-file in binary format. So your users already have a very nice solution. How do are you creating those files? I use an XSLT to transform my XML-data to SpreadsheetML that is just a specific XML-format. In that way I can do anything that is possible in Excel (formattings, formulas etc.). It yould be much more than just transforming XML to CSV.
Why do you want to change this? I can see that the resulting XML-Excel file usually is huge, is that the reason?
Now, since Excel 2007 format (xlsx) is basically a zip-archive of some XML-files describing the structure and the data of the file -- just rename the file extension name of any xlsx to zip and inspect after extracting -- you might be able to adapt your existing solution, i.e. instead of creating one SpreadsheetML-(XML)File create several and pack/zip them one the server. I assume your normal server installation should have the necessary tools.
BUT - this solution is difficult to maintain (as you might know wiht you current solution -- if I am at all right) and changes in the XSLT to add new functionality requires quite a bit of knowledge. So again I as well can recommend Aspose.Cells, even some knowledgeable users might be able to change simple formats.
HTH
Andreas
If you don't have office 2003 *or later) DLLs referenced to the project, I'm afraid you can't read that Office XML data in your program directly, unless of course you can manually parse the whole XML by studying its format.
You can use Open XML SDK to create or read xlsx files.
http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=5124

C# output to Excel spreadsheet, using Excel 2000+

I'm writing a program that reads a text file, extracts information, and outputs it to a template Excel spreadsheet that already exists.
I've managed to do this on my computer using the Microsoft.Office.Interop.Excel reference and its related methods, and it works fine. I have Excel 2010. However the computers that this program will be used on mostly have either Excel 2000 or Excel 2003, and it won't work on them.
Does anyone know a way to make a program target all versions of Excel from 2000 upwards?
Cheers,
Greg
If your needs are simple and you don't have $900 for Aspose.Cells to throw around, you can do any of the following:
Use NPOI to read, inject data into, and export your template.
Create a basic HTML file with a table and just named it *.xls. You can save your template in Excel as HTML and replace bits and pieces to insert your data.
Create an XML file using Office 2002/2003 XML format, it's pretty straightforward (caveat: can't be read in Excel 2000). As above, you can save your template in XML Spreadsheet format, read it in, and do some simple stuff to inject your data.
You really need to target 2000 or under. 2010 and 2003 will open a 2000 format document whereas 2000 will not open a 2010 document. Office has a single format for 97-2000 and that's what you need to create to make everybody happy.
Interop depends on the version you have installed and I personally have dodged using interop due to its "unmanaged" nature (and it seems to love file locks).
If you want hassle free and extremely fast/powerful creation of Excel documents, you really can not do better than Aspose.Cells in my opinion.
Find it here.

How can I programmatically create, read, write an excel without having office installed?

I'm confused as hell with all the bazillion ways to read/write/create excel files. VSTO, OLEDB, etc, but they all seem to have the requirement that office must be installed.
Here is my situation: I need to develop an app which will take an excel file as input, do some calculations and create a new excel file which will basically be a modification of the first excel file. All with the constraint that the machine that runs this may not have office installed. (Don't ask why...)
I need to support all excel formats. The only saving grace is that the formats spreadsheets themselves are really simple. Just a bunch of columns and values, nothing fancy. And unfortunately no CSV as the end user might not even know what a CSV file is.
write your excel in HTML table format:
<html>
<body>
<table>
<tr>
<td style="background-color:#acc3ff">Cell1</td>
<td style="font-weight:bold">Cell2</td>
</tr>
</table>
</body>
</html>
and give your file an xls extension. Excel will convert it automatically
Without Office installed you'll need something designed to understand the Excel binary file format (unless you only want to open Office 2007 .xlsx files).
The best I've found (and that I use) is SpreadsheetGear, which in addition to being .NET native, is much faster and more stable then the COM/OLE solutions (which I've used in the past)
read and write csv files instead. Excel reads them just fine and they're easier to use. If you need to work against .xls files then try having support for OpenOffice as well as Excel. OpenOffice can read and write excel files.
Did you consider way number bazillion and one: using the Open XML SDK? You can retain styles and tweak it to your liking. Anything you can do in an actual file is possible to achieve programatically. The SDK comes with a tool called Document Reflector that shows the underlying XML and even shows LINQ statements that can be used to generate them. That is key to playing around with it, seeing how the changes are made, then recreating that in code.
The only caveat is this will work for the new XML based formats (*.xlsx) not the older versions. There's a slight learning curve but more material is making its way on blogs and other sites.
If cost is not an issue, I'd suggest looking in Aspose's Excel product. I use their Word product and I've been satisfied.
Aspose.Cells
Excel XLSX files "just" XML files - more precisely ZIP files containing several XML files. Just rename a Excel file Test.xslx to Test.zip and open it with your favourit ZIP program. XML schemas are, afaik, standardized and availiable. But I think it might not be that easy to manipulate them only using primitive XML processiing tools and frameworks.
Excel files are in a proprietary format so (afaik) you're not going to be able to do this without having the office interop available. Some third party tools exist (which presumably licence the format from MS?) but I've not used them myself to comment on their usefulness.
I assume that you can't control the base file format, i.e. simple CSV or XML formats aren't going to be possible?
I used to use a very nice library called CarlosAg, which uses Excel XML format. It was great (and Excel recognizes the format), and also incredibly fast. Check it out here.
Oh, as a side note, we used to use this for the very same reason you need it. The servers that generated these files were not able to have Excel installed.
If you cannot work with CSV files as per #RHicke's suggestion, and assuming you are working on a web app, since a desktop app would be guaranteed to have XL installed as per requirements.
I'd say, create your processing app as a webservice, and build an XL addin which will interact with your webservice directly from XL.
For XLSX files, look at using http://www.codeplex.com/ExcelPackage. Otherwise, some paid 3rd party solutions are out there, like the one David suggested.
I can understand the requirement of not having office installed on a server machine.
There are many libraries like aspose being available, some of them requiring license though.
If you are targeting MS Excel formats, then a native, Interoperability library, ACE OLEDB data provider, from Microsoft is available which you can install on a machine and start reading, writing programmatically. You need to define a connection string and commands as per you needs. (Ref: This article #yoursandmyideas)talks about using this library along with setup and troubleshooting information.

Convert Word Document to XML and back ASP.Net

I need to convert Word Document to XML and back once editing has been performed on it.
I don't have Microsoft Office Installed at my server, and I want my users to edit their documents via Web Browser.
I am using C# and ASP.Net
Thanks
I believe the latest version of Microsoft Word (and Excel) already save files in XML format, hence the .docx and (.xlsx) extensions. Hope that suffices your need.
Alternatively, you could see if they are tools to convert the old .doc format files to .docx which should, as a result, provide you with a XML based word file.
So you have a couple of options here:
1) Use OfficeWriter from SoftArtisans. This allows you to crack open the binary office file format(e.g. .doc and .xls) note:I am biased because my company makes this product but I think it's awesome.
2) If you can use the newest file format (.docx and .xlsx) you can use the SDK that microsoft has released that will do all of that uzipping rezipping nonsense for you. (called the opem XML SDK)

How can a Word document be created in C#? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have a project where I would like to generate a report export in MS Word format. The report will include images/graphs, tables, and text. What is the best way to do this? Third party tools? What are your experiences?
The answer is going to depend slightly upon if the application is running on a server or if it is running on the client machine. If you are running on a server then you are going to want to use one of the XML based office generation formats as there are know issues when using Office Automation on a server.
However, if you are working on the client machine then you have a choice of either using Office Automation or using the Office Open XML format (see links below), which is supported by Microsoft Office 2000 and up either natively or through service packs. One draw back to this though is that you might not be able to embed some kinds of graphs or images that you wish to show.
The best way to go about things will all depend sightly upon how much time you have to invest in development. If you go the route of Office Automation there are quite a few good tutorials out there that can be found via Google and is fairly simple to learn. However, the Open Office XML format is fairly new so you might find the learning curve to be a bit higher.
Office Open XML Iinformation
Office Open XML - http://en.wikipedia.org/wiki/Office_Open_XML
OpenXML Developer - http://openxmldeveloper.org/default.aspx
Introducing the Office (2007) Open XML File Formats - http://msdn.microsoft.com/en-us/library/aa338205.aspx
DocX free library for creating DocX documents, actively developed and very easy and intuitive to use. Since CodePlex is dying, project has moved to github.
I have spent the last week or so getting up to speed on Office Open XML. We have a database application that stores survey data that we want to report in Microsoft Word. You can actually create Word 2007 (docx) files from scratch in C#. The Open XML SDK version 2 includes a cool application called the Document Reflector that will actually provide the C# code to fully recreate a Word document. You can use parts or all of the code, and substitute the bits you want to change on the fly. The help file included with the SDK has some good code samples as well.
There is no need for the Office Interop or any other Office software on the server - the new formats are 100% XML.
Have you considered using .RTF as an alternative?
It supports embedding images and tables as well as text, opens by default using Microsoft Word and whilst it's featureset is more limited (count out any advanced formatting) for something that looks and feels and opens like a Word document it's not far off.
Your end users probably won't notice.
I have found Aspose Words to be the best as not everybody can open Office Open XML/*.docx format files and the Word interop and Word automation can be buggy. Aspose Words supports most document file types from Word 97 upwards.
It is a pay-for component but has great support. The other alternative as already suggested is RTF.
To generate Word documents with Office Automation within .NET, specifically in C# or VB.NET:
Add the Microsoft.Office.Interop.Word assembly reference to your project. The path is \Visual Studio Tools for Office\PIA\Office11\Microsoft.Office.Interop.Word.dll.
Follow the Microsoft code example
you can find here: http://support.microsoft.com/kb/316384/en-us.
Schmidty, if you want to generate Word documents on a web server you will need a licence for each client (not just the web server). See this section in the first link Rob posted:
"Besides the technical problems, you must also consider licensing issues. Current licensing guidelines prevent Office applications from being used on a server to service client requests, unless those clients themselves have licensed copies of Office. Using server-side Automation to provide Office functionality to unlicensed workstations is not covered by the End User License Agreement (EULA)."
If you meet the licensing requirements, I think you will need to use COM Interop - to be specific, the Office XP Primary Interop Assemblies.
Check out VSTO (Visual Studio Tools for Office). It is fairly simple to create a Word template, inject an xml data island into it, then send it to the client. When the user opens the doc in Word, Word reads the xml and transforms it into WordML and renders it. You will want to look at the ServerDocument class of the VSTO library. No extra licensing is required from my experience.
I have had good success using the Syncfusion Backoffice DocIO which supports doc and docx formats.
In prior releases it did not support everything in word, but accoriding to your list we tested it with tables and text as a mail merge approach and it worked fine.
Not sure about the import of images though. On their blurb page http://www.syncfusion.com/products/DocIO/Backoffice/features/default.aspx it says
Blockquote
Essential DocIO has support for inserting both Scalar and Vector images into the document, in almost all formats. Bitmap, gif, png and tiff are some of the common image types supported.
So its worth considering.
As others have mentioned you can build up a RTF document, there are some good RTF libraries around for .net like http://www.codeproject.com/KB/string/nrtftree.aspx
I faced this problem and created a small library for this. It was used in several projects and then I decided to publish it. It is free and very very simple but I'm sure it will help with you with the task. Invoke the Office Open XML Library, http://invoke.co.nz/products/docx.aspx.
I've written a blog post series on Open XML WordprocessingML document generation. My approach is that you create a template document that contains content controls, and in each content control you write an XPath expression that defines how to retrieve the content from an XML document that contains the data that drives the document generation process. The code is free, and is licensed under the the Microsoft Reciprocal License (Ms-RL). In that same blog post series, I also explore an approach where you write C# code in content controls. The document generation process then processes the template document and generates a C# program that generates the desired documents. One advantage of this approach is that you can use any data source as the source of data for the document generation process. That code is also licenced under the Microsoft Reciprocal License.
I currently do this exact thing.
If the document isn't very big, doesn't contain images and such, then I store it as an RTF with #MergeFields# in it and simply replace them with content, sending the result down to the user as an RTF.
For larger documents, including images and dynamically inserted images, I save the initial Word document as a Single Webpage *.mht file containing the #MergeFields# again. I then do the same as above. Using this, I can easily render a DataTable with some basic Html table tags and replace one of the #MergeFields# with a whole table.
Images can be stored on your server and the url embedded into the document too.
Interestingly, the new Office 2007 file formats are actually zip files - if you rename the extension to .zip you can open them up and see their contents. This means you should be able to switch content such as images in and out using a simple C# zip library.
#Dale Ragan: That will work for the Office 2003 XML format, but that's not portable (as, say, .doc or .docx files would be).
To read/write those, you'll need to use the Word Object Library ActiveX control:
http://www.codeproject.com/KB/aspnet/wordapplication.aspx
#Danny Smurf: Actually this article describes what will become the Office Open XML format which Rob answered with. I will pay more attention to the links I post for now on to make sure there not obsolete. I actually did a search on WordML, which is what it was called at the time.
I believe that the Office Open XML format is the best way to go.
LibreOffice also supports headless interaction via API. Unfortunately there's currently not much information about this feature yet.. :(
You could also use Word document generator. It can be used for client-side or server-side deployment. From the project description:
WordDocumentGenerator is an utility to generate Word documents from
templates using Visual Studio 2010 and Open XML 2.0 SDK.
WordDocumentGenerator helps generate Word documents both
non-refresh-able as well as refresh-able based on predefined templates
using minimum code changes. Content controls are used as placeholders
for document generation. It supports Word 2007 and Word 2010.
Grab it: http://worddocgenerator.codeplex.com/
Download SDK: http://www.microsoft.com/en-us/download/details.aspx?id=5124
Another alternative is Windward Docgen (disclaimer - I'm the founder). With Windward you design the template in Word, including images, tables, graphs, gauges, and anything else you want. You can set tags where data from an XML or SQL datasource is inserted (including functionality like forEach loops, import, etc). And then generate the report to DOCX, PDF, HTML, etc.

Categories

Resources