I'm working on a code to make a MS Word to HTML system. After googleing for about half a minute, I was able to find the code which does exactly what I need. Now.. It works offline on the ASP.NET development server, but it won't work when I upload the files on my server.
I read a couple of posts, and the problem seems to be becouse the server does not have MS Office installed on it. Now, i'm not sure if it has, i'm still avaiting an email from the good people # hosting (but i assume it's not installed), but my question is...
Is there ANY way to make it work without th MS Office installed?
I'm using Microsoft.Office.Interop.Word ver. 12. / ASP 3.5 / C# and the error I'm getting is
Could not load file or assembly
'Microsoft.Office.Interop.Word,
Version=12.0.0.0, Culture=neutral,
PublicKeyToken=71e9bce111e9429c' or
one of its dependencies.
Thank you for your time!
The Interop library is not a "working" library in itself, it is only a wrapper around winword.exe for .NET programs, so using this library does not make any sense if you don't install or use Microsoft Word.
Instead you will need to find a library that allows for manipulating Word Documents. If you can constrain the documents to be in the new format (docx), then it will be quite an easy task, e.g. using the OOXML SDK (as proposed by Stilgar, too). But there are libraries for the old format, too.
Update: I have to admit, although I was convinced I searched and found some libraries for the old doc format before, I do not manage to find those anymore, probably because the result lists is "spoiled" by the many offers for docx. To be clear:
If you can afford to stick to docx (2007 or later) format, you should do that. Office Open XML is a (more or less) open standard based on ZIP and XML, and many tools already exist and will be developed in the future. The old format is much less supported nowadays.
If you have to go for the old format, too, then Aspose (as proposed by Uwe) is the only library I found.
I think the OOXML SDK may contain something but it will only work with docx and not with the old doc.
As for the old formats I am also interested in a cheap and easy way to support them without the need to use the Automation APIs
You should explain better what is the result you want to achieve
NO WAY, MS Office interop needs MS Word do be installed on the server
Depending on you needs, you should find the best 3rd party library (I suggest OpenXml.WordprocessingDocument) but code must be rewritten.
you can use Code7248.word_reader.dll
below is the sample code on how to use Code7248.word_reader.dll
add reference to this DLL in your project and copy below code.
using System;
using System.Collections.Generic;
using System.Text;
//add extra namespaces
using Code7248.word_reader;
namespace testWordRead
{
class Program
{
private void readFileContent(string path)
{
TextExtractor extractor = new TextExtractor(path);
string text = extractor.ExtractText();
Console.WriteLine(text);
}
static void Main(string[] args)
{
Program cs = new Program();
string path = "D:\Test\testdoc1.docx";
cs.readFileContent(path);
Console.ReadLine();
}
}
}
Related
Looking for suggestions for libraries that can generate PDF and RTF documents from stored data (not "HTML to PDF" or "URL to PDF"). With all functionality for adding images, encryption etc. We are currently looking for an alternative to PDFSharp-MigraDoc-GDI, which, although works with .NET Core, does not fully support it and we see compiler warnings - "This package may not be compatible with your project". We have also been getting issues on the IIS tier regarding GDI+. We've decided to play it safe and find an alternative. Does anyone have a solution that they would recommend? Thanks
As far as I know, you can write whole new documents using the Microsoft.Office.Interop library, here is this post that's talk about it (be careful about deploying things like these, you might need an office instalation running on the server):
https://www.c-sharpcorner.com/UploadFile/muralidharan.d/how-to-create-word-document-using-C-Sharp/
And I've found this post about using the library to print PDFs:
How do I convert Word files to PDF programmatically?
It's not much but hope that it helps, regards!
I've come across ExcelPackage and I found a couple of examples of using it, but none seem to work, they've all got some aspect, component missing or are for a different version of Visual Studio. I simply need to generate a .xls or .xlsx or even a .csv file, but as I am using a 3rd party server I can't use the Office .com objects. I have used SpreadsheetGear in the past which is expensive and as I am retired, I can't afford this sort of product.
If anyone has a working example of ExcelPackage or any other freeware offering, or can point me in the direction of one that has everything I need, it would be appreciated. A regular Web App rather than MVC would be preferred.
Take a look at Simplexcel by Michael Stum. It is designed around simplicity, is fully supported under ASP.net and should allow you to make simple, but extremely usable Excel cheats. You have an simple example available here.
Check out the Open XML SDK. This gives you the ability to generate and manipulate Office documents without using Office itself or the interop, and as such makes it a suitable approach from the server-side.
From within C#, I want to be able to take a DOCX file and convert it to PDF.
How can I do this?
The catch is that I would like to do other types too, e.g. images, doc files, etc.
I also ideally would like there to be no office installed on the computer where this software will be running.
Perhaps the answer is to some software that 'prints to pdf'
My software is dealing with arrays of data representing the file, so it would ideally be some kind of API that handles byte arrays.
There aren't a ton of good C# libraries for this one. It's hard to do without COM.
Here's one option:
http://www.aspose.com/categories/.net-components/aspose.pdf-for-.net/default.aspx
If you want something free (but requires Microsoft Word to be installed), you could try using Word itself via .NET code:
http://www.codeproject.com/KB/cs/CreatePDFsForFree.aspx
It isn't the solution for everything but it can be useful at times.
DOCX is Office 2007 format. If you don't mind using the built-in functionality of Office 2007, you might want to check this link out:
http://msdn.microsoft.com/en-us/library/bb412305.aspx
Office automation + Save As Pdf Add-in ?
Given a list of mailing addresses, I need to open an existing Word document, which is formatted for printing labels, and then insert each address into a different cell of the table. The current solution opens the Word application and moves the cursor to insert the text. However, after reading about the security issues and problems associated with opening the newer versions of Word from a web application, I have decided that I need to use another method.
I have looked into using Office Open XML, but I have not found any good resources that provide concrete information on exactly how to use it. Also, someone suggested that I use SQL reporting services, but searching for information on how to use them, lead me nowhere.
Which method do you think is the most appropriate for my problem?
Code samples and links to good tutorials would be extremely helpful.
Thanks for all the answers, but I really did not want to pay for a plugin and using Word automation was out of the question. So I kept searching and eventually, through some trial and error, found some answers.
After throughly searching through Microsoft's site, I found some newer articles on the Office Open XML SDK. I downloaded the new tools and just started going through each them.
I then found the Document Reflector, which creates a class to generate XML code based off an existing Word Document (.docx). Using my Label Template Document and the code this tool generated, I went through and added a loop that appends table cells for each address. It actually proved to be fairly simple and way faster than using Word automation.
So, if you're still using Word automation check out the Office Open XML tools. Their surprisingly extensive for a free download from Microsoft.
Office Open XML SDK 2.0 Download
I use the Words plugin from Aspose.com to do mail merges (programming guide).
You can take a look show 137 and 138 on dnrTV (www.dnrtv.com). In these video's Beth Massi shows how to do some editing and mail merging with OpenXML. She does this by using the Open XML SDK and xml literals in VB. It requires no third party components. Also it doesn't require MS Office to be installed on the machine.
This video inspired me as a C# developed (and no VB experience) to do some XML manipulation in a separate dll in VB. I call into this dll from my C# application.
It is worth a try.
We have the product Aspose that tvanfosson has mentioned. The edition that we purchased works with SQL Reporting Services so it can be used with the scheduler for creating output. It is really a great product and we used in a system that needed to support Korean characters in the final document. It works great and was under $1K with support. Not bad.
The advantage of using a product like this is that you can continue to manage your data and the skill set required to produce the documents is at a level where a variety of developers can support its use.
Vanstee,
If you really want to do this in code, check out this post I just found on Google
http://kellychronicles.spaces.live.com/blog/cns!A0D71E1614E8DBF8!1364.entry
If you are using reporting services cant you just move the information in the word doc into a database table and read it from there, taking word out of the equation?
The release notes of a software have some important data that I would like to extract in every release. Is there a way to extract certain information from Microsoft Word?
The application that I am thinking of would be written in C#, but I am okay if it is any other solution.
All MS Office products (Word, Office, etc.) are totally scriptable, both internally (using VBA) and externally (via OLE Automation, also known as ActiveX; in fact, VBA uses the interface exposed through OLE).
My suggestion would be to look for a library in your language that supports this. Here is a link to a Perl module, Win32::OLE, that does: as you can see, it's quite easy to use and very powerful. The interface should be similar for other languages.
I went through this a few years back. You can:
Use Word to convert the file into some other format, ASCII, RTF, XML etc.
Use some third-party app to convert to another format, such as ASCII.
Access the Word API through OLE and extract the information directly.
I couldn't find any generic libraries to read Word files, and back then all of the applications that read Word files only worked for a subset. Word changed often enough that they had trouble keeping up.
There were some documents that listed the specifics of the older Word file formats, the underlying file structure is outrageously complicated. Without a lot of resources it would be hard to keep code in sync with the file format.
Initially, I used Perl to drive Word and create new documents, but the solution was too fragile. Later I switch the whole application to work with PDFs instead, and gave up on Word.
Paul.
Probably not the most elegant solution but this seems to be the lightest method: Use a Cscript.
Just tried it on a sample word doc(2003) and it works perfectly.
More information: http://www.gregthatcher.com/Papers/VBScript/WordExtractScript.aspx
I did a lot of excel programming with the VSTO (Visual Studio Tools for Office) tools, I think you will be able to use the VSTO API to read a word doc. You should be able to use C#
You could write an IFilter to extract text from word files. No need to have Word installed.
You can work from within Word (VBA, VSTO) or outside it.
From outside it, automation is one approach.
Another is to avoid using Word entirely. If the docs are .docx, you can use anything which can manipulate an Open XML file. Microsoft has its Open XML SDK, and in the Java world you can use docx4j or POI.