c# pasting contents of word document into winforms control - c#

Is there a way to paste a particular word document's contents into any winforms controls? I need the exact formatting of the word document along with any printscreens it has.

I don't believe there is any Visual Studio native control to achieve what you want to do. The RichText Box will not provide 100% formatting.
In this case you will have to find a 3rd party control.
The following is free and you have the sourcecode but it looks a bit old:
https://www.codeproject.com/articles/3582/word-control-for-net
A quick google search yielded these products.
https://www.devexpress.com/products/net/controls/winforms/rich_editor/
https://marketplace.visualstudio.com/items?itemName=NevronSoftware.RichTextEditorforWinForms-12056

Related

Does MuPdf library have unicode or text search functionality?

Background
I am working on a WPF windows application and I want add embedded PDF viewer with only basic functionalities including PDF view, text search and page navigation.
I tried embedded Internet Explorer and Adobe PDF Reader installed method (this way ) but this method is not suitable for our requirement as Adobe PDF Reader has too may external links which can not be allowed because of the security reasons of the application.
Therefore, I am trying to use moonpdf library. This library works fine with our requirements but the only problem is there is no text search functionality in this library. (I think it shows PDF as images)
Then, I have download moonpdf source code and realized that moonpdf is using libmupdf.dll wrapping to c#.
I can modify the moonpdf source code and mupdf source code for our requirement if needed.
My Question
Is there any text search functionalities in mupdf? if so how can I use it?
In the basic mupdf library, there are several functions for searching for text. These work by searching a page for a text string, in a few different variants, and returns the area for all hits of the given text. You need to iterate over the pages yourself (in order to do forward or reverse search).
fz_quad hits[1000];
count = fz_search_page(ctx, page, needle, hits, nelem(hits));
That said, I do not know how or even if "moonpdf" has wrapped these functions.
You can certainly extract the text from a document, the MuPDF library will do that. I believe it's up to you to apply your own search criteria after that. I'm afraid I'm not expert enough to answer the 'how to' part of it though. I imagine one of the mutool examples would be helpful here though. I'll see if I can get one of the developers to answer.

Creating word documents within a c# application?

I've been searching on ways to create letters using a C# Windows Form application in Visual Studio based on information from a local SQL server. I've seen some other topics but each answer seems to be really different.
My knowledge is pretty basic about this and wouldn't mind a step in the right direction. Is it actually possible? is there a better solution rather than creating word documents?
I only really need to be able to create a word document which has some text and and tables is this possible?
If you are creating a document of same kind with different data then use a Word template (.dotx) and use content controls or bookmarks in the document.
Advantage: Saves your time in manipulating with formatting and alignment in the document
Then use Open XML to just replace your content controls or bookmarks with the values.
Advantage: You dont need Word Interop assemblies to be deployed. Faster and recommended by Microsoft.
Depending on your needs, you could either simply copy a pre-existing file, or if you need to modify the document from code, you can use Microsoft Office Automation. (see C# office 2010 automation)

Databinding in existing XPS document

I have an existing XPS file that I would like to use as a template and possibly bind data to it. I have tried several methods, but cannot seem to get it to work.
Does anyone have any experience altering an existing XPS file to add data at runtime and then print or save?
Any help is appreciated.
XPS documents conform to the Open XML standard. There is an SDK for working with these docs. Here is a How-to article by Beth Massi: "Accessing Open XML Document Parts with the Open XML SDK".
Since you are working with the internal doc structure you might also check out 'Open XML Package Editor" which lets you explore the doc with Visual Studio. Here is another How-to by Beth Massi: "Handy Visual Studio Add-In to View Office 2007 Files".
+tom
it's a bit of a challenge to do this with XPS, but it is possible.
You can do this with our NiXPS SDK.
I've posted an example on my blog a while ago:
XPS variable data example
Regards,
Nick
Bindings are evaluated during the process of writing to an XPS document. So you can't set up a {Binding} in a FixedDocument, Write that FD to an XpsDocument, and expect to get that original FD back again when you next open that saved doc.
Also, the standard XpsWriter does convert everything into Glyphs on canvases, so you can't, say, a textbox in the original and expect to be able to find it after its been saved to a document.
I've never used the NiXPS libraries, so if Nick says it can be done you might want to check it out.
One last possibility--You can create placeholders in a form that you will be able to find later. They'd have to be text (something like [[{{FORMFIELDHERELOL}}]]) with some kind of delimiter scheme to differentiate the text from everything else. You could then go spelunking in the XML looking for text that fits the delimeter pattern and switch out those glyphs for your binding text. Of course, the issue with THAT is that if you aren't putting X chars in place of X chars you might find you have to do some repositioning. As its all glyphs on canvas this might be slightly harder than, say, threading a needle with a shoelace.

Using Word 2007 as CMS page editor

I have been searching for several hours but i couldn't find anything about this... Basically I would like to create a template or plug-in for word 2007 that would allow someone to create new pages for a CMS. What I have in mind is something similar to blog post template. I know how to create a basic template but I can't find a way to publish the created document using a publish button inside the Word.
thnx in advance
I understand what you are trying to achieve, but Word is the wrong starting point. I would start with a much more basic text editor.
Word is horrible, horrible, horrible. Your site will define clear styles, yet Word will output nasty HTML that won't match your website's CSS definitions.
Your best bet therefore is to have a means to drop the Word file into the site, and have code programmatically analyse it and transform it into site-valid HTML. In Java you could use Apache POI, but that's very raw still. Might be a lot easier in a Microsoft centric world.
Far better, in my opinion, is to force people to learn Markdown, or BBCode, or HTML, or to use a Styled HTML Editor in your CMS - cut and paste plain text in, then style with the CMS defined styles.
As you are using Word 2007 you can export the document as XML and then use XSLT to generate the HTML.
If your CMS has an API or import facility you could convert the output from Word to suit that interface.
You can write a Word macro to add a Publish button/menu option to Word that will generate the correct output.
It's not a bad idea since it's all about the end user. If Word produces bad HTML, you should just make it semantic correct before posting it to the CMS.
I've never done this but I'm sure that it's possible to with .NET via the "Word 2007 Addin"-template (assuming Office 2007).
Good luck!
You can do what you want if you use SharePoint 2007 as your CMS. You can set up a blog on SharePoint 2007 and post to the blog from Word. If you use Office 2007 on the client end then you will get some nice buttons like "post to my blog" etc.
If you can't use SharePoint or are talking about an existing CMS, you have a lot of hurdles to jump through. This is a major undertaking and not something you can get a simple answer out of Stack Overflow.
Have you considered using one of the freely available Javascript WYSIWYG Editors such as TinyMCE http://tinymce.moxiecode.com/? When configured with all the options, it has an impressive amount of functionality and the interface is very similar to Word. I realize this doesn't directly answer your question, but as others have pointed out starting from Word is going to be difficult.
I've been on a team that wrote a Word addin for a custom CMS system. It was written in VB6 and was able to take a Word document and turn basic formatting information - lists, bold, italic and even tables into HTML, which was uploaded to the server. It didn't create new pages or manage the site in the addin though.
I would definitely avoid choosing Word as the editor for your CMS from my experience. The biggest issue is each time you want to update the addin you have to redistribute it to the company or companies using it. You can do this is as an IE active-x control but it's far easier just to handicap the user to a limited set of styling options via a Javascript editor.
Word does have a powerful API for manipulating your content with, however we needed to disable so many options in Word to avoid unwanted fonts and so on, it resembled Wordpad more than Word in the end.
If it's a greenfield project and you have the time, I would infact recommend using Silverlight 4.0 over a Javascript editor. Version 4.0 has a richtextbox control built in, plus there is also the excellent Vectorlight one.
May be it helps you, umbraco CMS allow editing with Microsoft Word.
For some reason this is a feature that Excel enjoys but not Word.
Excel can can automatically publish an HTML file version of your document when you save it.
Unfortunately Word seems to only be able to achieve this functionality when using Sharepoint, which is a shame because it can be quite useful.
What you can do, short of creating your own add-in is to add a bit of code to your template to create a HTML copy of your document whenever the user saves it.
First, make sure your template is macro-enabled (saved as .dotm file).
Second, while editing the template in Word, open the VBA code editor (ALT-F11)
In the project list double-click on your document to open its code-behind file.
Add the following bit of code to it, modifying the ActiveDocument.SaveAs path to something more appropriate to you, like a shared network folder where your CMS exposed by your CMS.
Sub FileSave()
' First Save the main document
ActiveDocument.Save
' Now we create a new document based on the current one
Selection.WholeStory
Selection.Copy
Documents.Add
Selection.PasteAndFormat wdPasteDefault
' Save it as HTML and close it
ActiveDocument.SaveAs "c:\temp\mydoc.html", fileformat:=wdFormatHTML
ActiveDocument.Close
End Sub
This will copy the original file into a blank new one that will be saved to HTML and closed before returning to the original file.
You can check some of the options to the Documents.Add if you want to use a different template than the normal one.
Security
because this template contains macros, you will have to install it with the other templates where Word expect them.
If you don't, then you'll get a security warning.
To avoid getting it, you can add the path where your templates are located to the list of Trusted Locations under Word's Options > Trust Center > Trust Center Settings > Trusted Locations.

Parsing Office Documents

I`d like to be able to read the content of office documents (for a custom crawler).
The office version that need to be readable are from 2000 to 2007. I mainly want to be crawling words, excel and powerpoint documents.
I don`t want to retrieve the formatting, only the text in it.
The crawler is based on lucene.NET if that can be of some help and is in c#.
I already used iTextSharp for parsing PDF
If you're already using Lucene.NET you might just want to take advantage of the various IFilters already available for doing this. Take a look at the open source SeekAFile project. It will show you how to use an IFilter to open and extract this information from any filetype where an IFilter is available. There are IFilters for Word, Excel, Powerpoint, PDf, and most of the other common document types.
There is an excelent open source project POI, only drawback - it is written for Java.
The .net port is somehow very beta.
Here is a good list of various tools for converting Word documents to plaintext, which you can then do whatever with.
Here's a nice little post on c-charpcorner by Krishnan LN that gives basic code to grab the text from a Word document using the Word Primary Interop assemblies.
Basically, you get the "WholeStory" property out of the Word document, paste it to the clipboard, then pull it from the clipboard while converting it to text format. The clipboard step is presumably done to strip out formatting.
For PowerPoint, you do a similar thing, but you need to loop through the slides, then for each slide loop through the shapes, and grab the "TextFrame.TextRange.Text" property in each shape.
For Excel, since Excel can be an OleDb data source, it's easiest to use ADO.NET. Here's a good post by Laurent Bugnion that walks through this technique.
You might also consider checking out DtSearch (www.DtSearch.com). Although it is primarily a searching tool, it does a great job of extracting text from a large number of file types and is considerably cheaper than other options like the Oracle/Stellent OutsideIn technology or the equivalent from Autonomy.
I've been using DtSearch for years and find it indispensible for this type of task.

Categories

Resources