Highlighting a link which contains a specific word using c# - c#

I am extracting html from a website and is writing the output html to a word document using c#. I have completed this by using httpwerequest and by html parsing. My final step is to highlight a specific link in the word document if it contains a anchor keyword. For example I am getting several links in the html which i wrote to word like
Kroger recalls selections of spinach packages
Kroger recalls spinach over Listeria risk
Buy Safeway And Sell Whole Foods
These are all hyperlinks. I want to highlight those links which contain the anchor keyword named Kroger. So only the first two links will be highlighted in word document.How to achieve this?. Thanks in advance.

This article may help to achieve what you are looking for:
Search and highlight text in MS Word through C#
From there, using Microsoft.Office.Interop.Word, you can:
define a range in your document specifying its start and end positions (see this MSDN page for more info)
specify a value for the range's HighlightColorIndex property (e.g. a value of wdYellow). More info on MSDN also for this one.

Related

Replacing Microsoft Word Content Control Dynamically At Runtime

I want to be able to bind content control fields to each others' values. Basically if you change a field at the top, all others in the document also update to that. I'm replacing hundreds of individual variables, each with 100 duplicates. There is a better way than the 'Find and Replace Tool'.
Here is a sample document directly from Microsoft's site that shows exactly what I would like to be able to do:
https://omextemplates.content.office.net/support/templates/en-us/tf03444179.dotx
When the '' value is changed, all others in the document update.
I've already looked at plenty of solutions like: c# word interop find and replace everything
But they do not dynamically respond during run-time. In other words you have to go in and change which string you want to replace for each value.
Been looking for a while now, thanks in advance if anyone else can figure this out.

Compare word documents(.docx) with a document template(.dotx)

Is there any way I can compare a word document(.docx) with a document template(.dotx) generated in microsoft word.
I want to do this comparison programmatically using c#.
I want to compare both documents word to word so that I can determine to which template the document belongs. I don't just want to compare the size of both but I want to compare the contents also.
By this comparison I want get the following results.
From which document template the document is generated.
In the document template, I want to check that at which place a particular information is stored.
Say for example I want to search for the communication information of a person, then I want to traverse the document and check that At which position the template has the area/section for Address.(i.e. Top left corner, top center, In a paragraph, In body etc)
In same way I want to extract other information too, Like Link to other documents etc.
After getting those positions I want to get that Information from the .Docx file.
Say, If I found that the Address in the top-left and there are five links referring to other documents in five different paragraphs. Then what I want is to get the Address and save it to a variable. After that I want to replace those link contents from placeholders to Actual hyperLinks. i.e If a Link is referring to Doc-A then Instead of just showing a Plain text I want replace it with A hyperlink to Doc-A.
Any suggestions?
Thank You.
Your question is rather too vague and involved to give a really good answer, however...
To find out from which template a document was generated the object model provides the property: Document.AttachedTemplate with will return the full file name. This is certainly better than comparing word-by-word (which is also very time-consuming)
The Word object model also provides the method CompareDocuments (belongs to the Word.Application class). This will "highlight" differences in the text content of two documents.
Links will be found in the Document.Hyperlinks collection
Getting the position of things is a bit chancy with Word and it depends on what you really mean by "top-left", etc. Better would be to construct the templates using content controls, form fields and/or bookmarks so that you can uniquely identify important sections. However, Word does provide the Range.get_Information method that can return relative and absolute positions on the page if that's what you really want.

Best way to format randomly copied texts from other websites?

Problem:
My site allows users to copy/paste contents from other files/documents like MS Word and websites (eg CNN.com) into the Rich TextEditor we provide. This Rick TextEditor supports (and we too have to support) paste contents with embedded styles, this brings random styles, tags inline styles from content origin.
Eg: If you paste from any MS word document, it brings H1, H2, P, UL/OL/LI, STRONG, I, EM, TABLE etc. with their own styles. Same happens when you copy paste from other webpages.
How To Format?
I am looking for THE best way to handle the formatting of these kinds of user-generated contents. First, I need to keep the copied tags intact. Lets say, H1 was brought from user from MS Word - I have to keep this yet style on my own using given corporate branding.
Another problem is, when you copy/paste from external origin - some tags are not properly closed - this causes my layout break. How do we handle this?
For styles, m applying
.article * {
allKnownCSSProperties: myValues!important;
}
Any method would work. JavaScript, C# is preferred.
To strip out unwanted styles a simple regex would suffice. In Javascript:
/( style=['"][^'"]*['"])/g
I'd try to solve problem with lack of closed tags as this:
Parse whole message and collect tags that's not ends with /> and remove them if you're find same tag starts with </. Exclude tags that may not to have close tag, generate close tags for all tags that still in collection and place them at the end of yours Rich TextEditor layout. It may not work in some cases or looks clumsy but that first that comes in mind and it may help to solve the problem

Spell check merge fields in MS Word programmatically in C# using word interop

I have been trying to enable spell check for mergefields after they have been resolved to text (after mail-merge). By default Word does not do it. The workaround is to set the proofing language in Word and un-check the "Do not check spelling or grammar" tick box, however, I want to do this programmatically. I have tried setting Range.LanguageID and other options using Word Interop with no results. The text that comes from mergefields is not spell checked (spelling errors are not underlined). Can you please advise how this could be resolved? Thanks
In the end I modified the rtf document. I realised that there were \noproof tags in the rtf document and after finding out what they meant I decided to remove them. I've also removed \lang1024 and \langfe1024 which seemed to appear before each \noproof tag. Apparently they store information about the language of the formatted field/paragraph. The end result is that the text that comes from merge fields is now spell checked, which is what I wanted. I haven't been able to find any other solution. I hope this post helps someone else as well.

How to retrieve all the paragraph in the specified particullar page using C#(4.0) and Open Xml Sdk(2.0)?

We are developing C#.Net(4.0) Windows Form Based Application with the use of Open Xml Sdk(2.0) for manipulating MS-WORD Files.Now i want to get the all the paragraphs in particular page.The user prompted for getting particular page no of the word file to get the all the paragraphs inside the user selected page number. How i do it?
Taking a quick look at the underlying XML it doesn't look like there is an attribute on the paragraph element that will tell you which page it will appear on. The best suggestion I can give you is to have some placeholder text at the top and bottom of each page. Then search for the a certain instance of the placeholder text based on which page the user specifies. Once you have a starting point you could retrieve all paragraphs between the two placeholder paragraph elements.
For example, if a user enters in page two, you would search for the third instance of a paragraph that contains this placeholder text and then retrieve all paragraphs until you reach the next instance of the placeholder text. I know this isn't ideal, but its one workaround I could think of that might be feasible.

Categories

Resources