Hey all, I need to read many word docs in server.
But you know,the component in .net is based on COM,it would work reliably if I use it.
It is so sick.You know, in word 2007,because of the xml file,there are many ways to read.
I wanna find a open source way to read word 2003,Thanks.
I'd suggest that you use the NPOI library to read .doc files: http://npoi.codeplex.com/
The library is the .NET version of the POI Java project and licensed under the Apache License.
Related
Is it possible using OpenXML to break links to external spreadsheets and convert formulas linked to them to values similar to how this works with the MS Office interop libraries?
Depending on what you mean with "similar to how this works with the interop libraries", the simple answer is yes. Those links are all reflected in the Open XML markup, which you can change using the open-source Open XML SDK (see https://github.com/OfficeDev/Open-XML-SDK). Thus, while the Open XML SDK-provided classes and APIs are certainly different from the Microsoft Office interop library-provided classes and APIs, the outcome or effect is the same.
Mind you the Open XML SDK provides you with very low-level access to the Open XML markup, so this is a very steep learning curve for somebody without prior experience with Open XML or the Open XML SDK.
To see how this can be done with the Open XML SDK, I recommend you download the Open XML Productivity Tool (but use the latest NuGet package for any development). Should you have two versions of the same workbook, i.e., one with external links and one with those links removed, the Productivity Tool allows you to compare those documents and shows the differences in the Open XML markup and the C# code required to remove the Open XML markup, using the Open XML SDK.
I have code in C# for word automation in MS-office.But I want word automation in openoffice using C#. so Is it possible through openoffice? I want to read ,write and saveAs .doc file programatically in openoffice using C#
Thanks in advance
Short answer no,
Word is not part of Open Office, so how would that work?
You can however automate Writer but, it has a different model so much of you code would have to be re-written.
If Open Office were to expose a compatible API I think that would raise some interesting Copyright and Intellectual Property issues but, I'm no lawyer. However, I'm not extolling the benefits of either product or API, they are just different.
It's possible. You must setup. OpenOffice, OpenOffice SDK, when the SDK is installed, you must run setsdkenv_windows.bat in the SDK folder, then it will require another components, you must install just 3 or 4 from here http://gnuwin32.sourceforge.net/packages.html.
The sample for CALC could be in a path like C:\Program Files\OpenOffice.org 3\Basis\sdk\examples\CLI\CSharp\Spreadsheet
The samples for Writer are just online, just now I need the samples for Impress, but the automation with the automation for CALC is 10/10
I need to export data into a word doc using ".dot" models, with an application written in asp.net.
Can you give me some links to learn this trick? I'm sure is it possible without any external libraries.
Thanks to the universal knowledge ;-)
I doubt you will able to do this without any extra libraries.
If you are using Word 2007+ format (eg .docx, .dotx) you can use the Open XML SDK. If not, your best bet is probably Aspose.Words
You'll need to have MS Word installed on the machine that will fill the templates (.dot files), and then you can call an instance of Word to do the replacements for you, using Microsoft.Office.Interop.Word.
If this has to be implemented on a server, you might have a problem. Microsoft advises against installing Office on servers because of stability and reliability issues. http://support.microsoft.com/kb/257757 focusses on some of these problems and also mentions some alternatives.
I"m looking to read the contents of a Word file on an application running on a webserver - without having word installed. Does a native .net solution for this exist?
Aspose makes a paid solution for doing just about anything with any Office format:
http://www.aspose.com/categories/.net-components/aspose.words-for-.net/default.aspx
There are commercial options too, but that's what OpenXML is all about as long as you are dealing with docx files only. If you need doc files, you will probably need to purchase Aspose's Aspose.Words for .NET.
i have used several SDK, for now, the best is Aspose.words, the openxml sdk 2.5 is also a nice choice, but the api is too low, so if you use openxml sdk that means will be writing more code, and remenber user openxml sdk tool together, it is a nice tool can make coding simple.
you can look this video for a overview:
how to use openxml sdk tool
another choice: GemBox.Document, a commercial option, cheaper than aspose.words.
The release notes of a software have some important data that I would like to extract in every release. Is there a way to extract certain information from Microsoft Word?
The application that I am thinking of would be written in C#, but I am okay if it is any other solution.
All MS Office products (Word, Office, etc.) are totally scriptable, both internally (using VBA) and externally (via OLE Automation, also known as ActiveX; in fact, VBA uses the interface exposed through OLE).
My suggestion would be to look for a library in your language that supports this. Here is a link to a Perl module, Win32::OLE, that does: as you can see, it's quite easy to use and very powerful. The interface should be similar for other languages.
I went through this a few years back. You can:
Use Word to convert the file into some other format, ASCII, RTF, XML etc.
Use some third-party app to convert to another format, such as ASCII.
Access the Word API through OLE and extract the information directly.
I couldn't find any generic libraries to read Word files, and back then all of the applications that read Word files only worked for a subset. Word changed often enough that they had trouble keeping up.
There were some documents that listed the specifics of the older Word file formats, the underlying file structure is outrageously complicated. Without a lot of resources it would be hard to keep code in sync with the file format.
Initially, I used Perl to drive Word and create new documents, but the solution was too fragile. Later I switch the whole application to work with PDFs instead, and gave up on Word.
Paul.
Probably not the most elegant solution but this seems to be the lightest method: Use a Cscript.
Just tried it on a sample word doc(2003) and it works perfectly.
More information: http://www.gregthatcher.com/Papers/VBScript/WordExtractScript.aspx
I did a lot of excel programming with the VSTO (Visual Studio Tools for Office) tools, I think you will be able to use the VSTO API to read a word doc. You should be able to use C#
You could write an IFilter to extract text from word files. No need to have Word installed.
You can work from within Word (VBA, VSTO) or outside it.
From outside it, automation is one approach.
Another is to avoid using Word entirely. If the docs are .docx, you can use anything which can manipulate an Open XML file. Microsoft has its Open XML SDK, and in the Java world you can use docx4j or POI.