Is it possible to create multilingual help (.chm) file? - c#

I need to create multilingual help (.chm) file for my WPF application. Please suggest best way to create it.

I'd try to steer clear of that if possible. CHM is a proprietary format and, although it has been reverse engineered, I think you'll get far more benefit from doing a truly portable solution like on-disk HTML.
Back when we still used CHM files, we found no easy way to embed multi-language capability into a single file and we had to provide translations in independent CHM files, leading to massive duplication of things like charts, pictures and so forth (this was many years ago so you should check if the situation has improved since then, if you really want to use CHM).
The support for Unicode was, shall we say, less than adequate, and there were numerous security problems which caused many customers to disallow use of CHM files - seriously, who in their right mind allows arbitrary code to be run by a help system?
With on-disk HTML, not only did this duplication disappear (since each language version included common images), we also got much better Unicode support and the ability to have a default front page (in English) with links to alternative front pages for other locales.
And we gained a big boost in portability since it's an open standard. That means we could pretty much run it in any browser on any platform.
And, on top of that, it appears Microsoft don't support it any more. From the Wikipedia CHM article:
In 2002, Microsoft announced security risks associated with the .CHM format, as well as security bulletins and patches. They have since announced their intentions not to develop the .CHM format further.

As the comments in the previous answer state, the CHM format is both a very old format by Microsoft, as well as being proprietary. Distributing HTML files with your application will accomplish the same functionality as a single CHM file. Worrying about the language of the user interface the user will see is more of a non-issue; chances are if the user is reading Italian help, he or she will be using a (1) Italian-localized version of Windows and (2) an Italian-language localized web browser.
That being said, because CHM is an old format and seemingly partially unsupported now, you can generate the same file based on the reversed engineered specification that the CHM files follow. Furthermore, because CHM is merely a binary file format emcompassing HTML files, encoding the HTML files using UTF-8 will accomplish getting the help documents themselves in whichever language you desire.
There is no Microsoft-supported CHM .NET API, so you'll have to output the Binary yourself using Streams / BinaryWriter.

Related

Batch conversion of docx to clean HTML

I'm starting to wonder if this is even possible. I've searched for solutions on Google and come up with nothing that works exactly how I'd like it to.
I think it'd benefit to explain what that entails. I work for database group at my university's IT department. My main job is to take specs of a report in a docx file, copy that over to dreamweaver, fix some formatting, and put it onto their website. My issue is that it's ridiculously tedious to do this over and over. I figured, hey, I haven't written anything in C# for some time now, perhaps I could write an application to grab a docx file, convert it to HTML, fix the CSS, stick the header, and footer from the webpage on there, and save the result. I originally planned to have it do one by one, but it probably wouldn't be difficult to have it input a list of files and batch convert.
I've found these relevant topics on how to accomplish this, but they don't fit my needs well enough.
http://www.techrepublic.com/blog/howdoi/how-do-i-modify-word-documents-using-c/190
This is probably fine for a few documents, but since it's just automating an instance of Word, I feel like it'd be slow and memory intensive. I'd prefer to avoid opening and closing an instance of Word 50+ times.
http://openxmldeveloper.org/articles/333.aspx
This is what I started using. XSLT had the benefit of not needing word to be installed nor ran for each file. After some searching I got a proof of concept working. It takes in a docx file, decompresses it, grabs the document.xml from that, and uses the DocX2Html.xsl file I scavenged from OpenXML viewer. I believe that was originally provided by MS for sharepoint servers to provide the ability to render word documents in a browser. Or something along those lines.
After adjusting that code to fit my needs, and having issues with the objXSLT.Load () method, I ended up using IlMerge to make the XSL into a DLL. No idea why I kept getting a compile error when using the plain old XSL file, but the DLL worked fine, so I was satisfied. Here (http://pastebin.com/a5HBAakJ) is my current code. It does the job of converting docx to HTML just fine (other than random spaces between some words), but the result file has ridiculously ugly HTML syntax. An example of this monstrosity can be found here (http://pastebin.com/b8sPGmFE).
Does anyone know how I could remedy this? I'm thinking perhaps I need to make a new XSL file, as the one MS provided is what's responsible for sticking all those tags and extra code in there. My issue with that is that I don't know anything about how to do that. Perhaps there's an alternative version already out there. All I'd need is one that will preserve tables and text formatting. Images aren't needed.
This looks like just what you need: http://msdn.microsoft.com/en-us/library/ff628051(v=office.14).aspx
The author Eric White blogged about his experiences developing that tool. You can see that list of posts on his blog here: http://blogs.msdn.com/b/ericwhite/archive/2008/10/20/eric-white-s-blog-s-table-of-contents.aspx#Open_XML_to_XHtml
Since I'm a big fan of Aspose.Words, a commercial library to create/process Word documents, I would do something like:
Open the Word document with Aspose.Words.
Save the Word document as HTML.
Use something like SgmlReader or HTML Agility Pack (or even Regular Expressions if it is suitable) to remove unwanted HTML tags/attributes.
Since you wrote you work at an university, I'm not sure whether commercial packages are an option, though.
Hi not sure what the rules are on promoting your own solutions, so do let me know if I am out of line.
I am a web developer who had the same issues, so I created my own tool:
http://www.convertwordtohtml.com
We are also working on a new version that will have even better conversion quality and one click conversion eg you can right click on a word file and it will be directly converted to html and the code placed into the clipboard. The current version also supports command line access and the new version will have a server version to.
There is a free trial version downloadable from the site , and if you have any questions do contact me any time.

Give users the ability to edit Word 97-2010 documents within a .NET application

I was wondering if anyone has any idea of any product/method to give my end users the ability to edit Word documents within our C#/.NET application, avoiding the use of Automation and separate instances of Word opening outside of the application. This is a possibility [backup plan!] - but one that I'd rather not have to implement (due to the amount of work involved and having users exit our application).
I know that I could possibly use the WebBrowser control - but from what I've been able to find -- support for this is sketchy at best, and things such as toolbars are not present, and it does not appear to work with Word 2010 anyway.
I've been evaluating a few products that claim to do this but many are lacking in features or produce compatibility errors within documents rendering them useless when opened in Word.
We are using Word 2003 and Word 2010. Our documents start out as .DOCX files through our custom merge/templating processes.
Any suggestions for products or other ideas would be great.
Edit:
We're creating documents without issue using OpenXML. Fun stuff, works really well. However, at the end of the day I would prefer to have users editing the created documents as well as legacy documents (created as .DOC files) within our .NET application directly. Unforunately, with Microsoft removing the ability to embed via ActiveX/OLE, etc. there isn't a way to do this. What I am looking for is a 3rd party product to achieve this, which should be virtually 100% compatible with both the .DOC and .DOCX formats.
For those asking why ? Security, ease of use, etc. We are storing documents in a database. Once I start dropping files on the filesystem and working with Automation support/macros, ... there's a lot of things that would have to be done to get the files back into the database / update, etc. This is made especially difficult since Word doesn't expose the raw bytes[] of a document and files must be saved as temporary files somewhere on the fs. Just a lot of headaches.
So, the "easiest" solution - embed Word [seems not possible] or use a 3rd party product that supports editing .DOC/.DOCX files.
An example is DevExpress XtraRichEdit control - unfortunately, while it supports a lot of nice Word-like/compatible features it only works with .DOCX files.. and isn't 100% feature complete, compared to Word.
The file structure of a word document is huge, it could take hundreds of man hours to program even limited .doc/docx support. What exactly is the reason for using your program to edit a word file over word itself?
I am not exactly sure how Word 2003 has .docx support though, my understanding is there was only a word viewer release when Office 2007 was released, it of course has been years since thats been a problem.
If you are going to actually do this only add support for .doc files since there is more information out there, you can allow word itself to handle the converstion to a .docx file if you want.
You are not going to find a third party product that does this. The amount of effort required to build an app that 100% supports the Word formats is beyond consideration. Not just every feature, but every bug as well would have to be duplicated. Considering the potential legal pitfalls of doing such, no one in their right mind would bother trying. The legal aspects, incidentally, is one of the primary reasons for the new formats.
Which means you have to go external. There are two really good options here.
One would be to hook into Office Live to give them the ability to edit Microsoft Documents online.
Another possibility is to just leverage Sharepoint in your application. It has built in methods for document workflow and integrates nicely with Office.
A third possibility would be to write your own word add-in which would take care of saving / loading the documents from your system. I'd go with the first two above before going this route.
This used to be supported through a feature called OLE Embedding. Support for it has been disappearing from Microsoft software and tools over the past 10 years. Notably .NET has no support for it whatsoever. Office was one of the last hold-outs with 2007 already getting pretty cranky about it. But this indeed looks to be completely gonzo in the 2010 edition. All download links to the DSOFramer control, a generic ActiveX embedding control were removed around the time that 2010 went into beta.
There's no future here, look at VSTO for the road ahead.
Word Automation Services and Office Web Apps (requires SP 2010).
Certainly not 100% coverage of Word features, but have you tried ASPOSE.Words.NET Total or TXTextControl.NET?

Compare PDF documents in Adobe Acrobat via SDK

We are planning on implementing a solution for comparing different revisions of a PDF document in our .Net Windows Forms application. In Adobe Acrobat there is a nice feature for comparing two documents, but I have not been able to find any information about whether it is possible to create a plug-in (or something else) to this feature from our application.
I would really appreciate it if any of you could point me in the direction to how I should go about to make such a solution.
I have also looked at other threads here at Stackoverflow for comparing PDF documents, particularly these threads:
How to compare two PDF-files
PDF-libraries
I did not really find a good solution there for a library or SDK letting us create a good solution for comparing PDF-documents in a way which is easy to understand for users of the system.
Do you know any good solutions to solve this problem?
All help appreciated! :)
Do you know the pdf files? or you just want to make the compare without knowing it. If you know the pdf files, you can use variables values on the specific fields and compare the values between files, instead comparing the entire pdf file.

Best way to store multiple revisions of a text file to a single file

I'm working on a C# application that needs to store all the successive revisions of a given report file to a single project file: each time the (plain text) report file changes, the contents of the new version shall be appended to the project file, along with some metadata. Other requirements:
each version of the report file is 100 kB to 1 MB. Theoritically, the maximum number of revisions is unlimited but it should be less than 1000 in practice.
to keep things simple, I'd like to avoid computing differences between the revisions of the report - just store the whole report to the project file every time it has changed.
the project file should be compressed - it doesn't need to be a text file
it should be easy to retrieve a given version of the report from the application
How can I implement this in an efficient way? Should I create a custom binary file, consider using a database, other ideas?
Many thanks, Guy.
What's wrong with the simple workflow?
Un-gzip file
Append header and new report
Gzip project file
Gzip is a standard format, so it's easily accessible. Subsequent reports probably won't change that much, so you'll have a great compression ratio. To file every report, just open the file and scan the headers. (If scanning doesn't work, also mirror the metadata in an SQLite database, and make sure to include offsets into the project file so you can seek to the right place quickly.)
If your requirements are flexible (e.g. that "shall append" part) and you just want something to keep track of past versions of the file, a revision control system will do all of what you need quite easily.
No need to implement that. I would suggest you to use source control. Personally I use subversion with TortoiseSVN client. There is also a plug-in that integrates Subversion with Visual Studio, VisualSVN. Have a look at them.
If using SVN is not an option, you can just store each revision in an individual file (with filename that represents date for example). You can use separate files for metadata as well. Then all the aforementioned files are zipped into one file (look at http://DotNetZip.codeplex.com/ for example).
I don't think there is much point building this yourself when there are already tens, if not hundreds, of systems that are basically designed to do exactly this - source control systems.
I'd recommend choosing some source control solution that has bindings to C# and store your document in there. Then you can easily check out any revision of the document. You will also be able to diff, branch, etc. if necessary.
To give just one example to get you started you can use Subversion with C# bindings.
You could use alternate data streams to store the old revisions of your file. There is no built-in support in the .NET framework, but there exist some helper classes and articles like here and here.
I have never used this myself, so I can't really tell if this is a good option. But it seems, it would make an elegant solution, since you could store each file version in a separate data stream and only the current version in the "main file". In any case, it will probably only work on NTFS drives.
I think that the already SVN (or another source control system) is a very good idea because source control seems to have exactly the features you require. But if that's not an option you could use a file database like SQL Server Compact Edition or SQLite.

ASP.Net Converting and Merging documents into single PDF

I need to have the ability to convert and merge various documents into a single Pdf.
The documents could be of varying types, such as Word, Open Office, Images, Text, Web pages (by URL) and the PDF would usually consist of 2-3 documents.
At the moment, we are using BCL Technologies easyPDF with Microsoft Office installed onto the Server. This handles most documents but we haven't had it doing Open Office ones yet.
We currently produce around 100-1000 of these PDF's per day.
The reason I am asking the question is that performance is a key issue. The PDF is generated for users on the fly and so the waiting times we are currently getting of 30-60 seconds is becoming unacceptable.
We have done some caching around documents when they are intially uploaded so the main tasks that happens when a User requests a Pdf is merging a number of already generated Pdf's.
Does anyone else have any other tools they have used that work reliably for most common document types and above all, quickly? When put like that, it seems like I'm asking a lot!
Edit:
Thanks for all the great advice, I'll look into some of these and compare performance.
Just to add to all this, money is not really an object. We're more than happy to pay for different applications to perform each task as well as looking into various hardware options to distribute the load as much as possible.
Merging multiple PDF documents is normally simple enough (as long as they don't need to be merged on the same page) - you could compare your merge performance with something like iTextSharp (.NET version of iText) to be sure it isn't a bottleneck - otherwise the conversion from other formats to PDF is likely the bottleneck.
In almost all cases, the method used to convert X to PDF is to execute the applications print command, targeted at a software PDF printer, to create a temporary PDF file.
This means:
The target application (for example Office) is opened and closed
The document has to travel through the printing service
In your situation, are you converting arbitrary documents submitted by the users, or do the documents come from a stored library of files? If it's a library, you could make a PDF copy of each file as it is added to the library (instead of when the user makes a request), and then only merge the PDF files.
We use ABC Pdf. I don't know if it will be fast enough for your needs, but it seems to work for our use.
I had a very similar issue where we had documents that were already existing in PDF format and needed to allow the user to see them all combined together. We purchased the PDF4NET product which was about $500 from what I recall. It was extremely easy to use and they provide awesome examples of how to use the tools.
O2 Solutions - PDF4NET
Here is the code sample that they provide for merging. The top line looks like it just outputs the file, the second 2 lines allow for streaming the content back to the user.
PDFFile.MergeFilesToDisk( "append.pdf", "unicode.pdf", "multicolumntextandimages.pdf" );
PDFDocument doc = PDFFile.MergeFilesToDoc( "append.pdf", "unicode.pdf", "multicolumntextandimages.pdf" );
doc.SaveToStream( stream );
You say you're using Microsoft Office to open these files, I would imagine this is the bottleneck rather than the actual PDF creation.
Is it possible to distill these documents into a more accessible format (html/xml/database), so that it's not necessary to open office every time a PDF needs to be created?
While I have no PDF conversion suggestions I can say that this problem sounds like one which could be distributed over a number of nodes. Do you find that the PDF generation is CPU-bound or are there other limiting factors? Before expending too much effort on rewriting the PDF library interface you might want to see what the bottlenecks are.

Categories

Resources