Best way to format randomly copied texts from other websites? - c#

Problem:
My site allows users to copy/paste contents from other files/documents like MS Word and websites (eg CNN.com) into the Rich TextEditor we provide. This Rick TextEditor supports (and we too have to support) paste contents with embedded styles, this brings random styles, tags inline styles from content origin.
Eg: If you paste from any MS word document, it brings H1, H2, P, UL/OL/LI, STRONG, I, EM, TABLE etc. with their own styles. Same happens when you copy paste from other webpages.
How To Format?
I am looking for THE best way to handle the formatting of these kinds of user-generated contents. First, I need to keep the copied tags intact. Lets say, H1 was brought from user from MS Word - I have to keep this yet style on my own using given corporate branding.
Another problem is, when you copy/paste from external origin - some tags are not properly closed - this causes my layout break. How do we handle this?
For styles, m applying
.article * {
allKnownCSSProperties: myValues!important;
}
Any method would work. JavaScript, C# is preferred.

To strip out unwanted styles a simple regex would suffice. In Javascript:
/( style=['"][^'"]*['"])/g

I'd try to solve problem with lack of closed tags as this:
Parse whole message and collect tags that's not ends with /> and remove them if you're find same tag starts with </. Exclude tags that may not to have close tag, generate close tags for all tags that still in collection and place them at the end of yours Rich TextEditor layout. It may not work in some cases or looks clumsy but that first that comes in mind and it may help to solve the problem

Related

Replacing Microsoft Word Content Control Dynamically At Runtime

I want to be able to bind content control fields to each others' values. Basically if you change a field at the top, all others in the document also update to that. I'm replacing hundreds of individual variables, each with 100 duplicates. There is a better way than the 'Find and Replace Tool'.
Here is a sample document directly from Microsoft's site that shows exactly what I would like to be able to do:
https://omextemplates.content.office.net/support/templates/en-us/tf03444179.dotx
When the '' value is changed, all others in the document update.
I've already looked at plenty of solutions like: c# word interop find and replace everything
But they do not dynamically respond during run-time. In other words you have to go in and change which string you want to replace for each value.
Been looking for a while now, thanks in advance if anyone else can figure this out.

how to automatically extract strings from project and add them to resx file in wpf

I have a very large wpf project and after two years of development i requested to make it bilingual, is there any way to automatically extract strings in labels and grid headers in xaml files so i can translate them instead of manually extract them?
I don't know if there is any existing tool to help, but you can implement it yourself, I faced such a problem when I was converting a large solution of above 100 C++ projects from ansi to unicode strings, I implemented a simple tool with one window containing 2 rich edit boxes, one for the original text with replaces colored, and other for the replacement, for manipulating the resex files have a look at Working with .resx Files Programmatically, Don't forget that regular expression is your friend to accomplish this tool.
You can automate this process by finding all Label controls in WPG root Grid element (e.g. mainGrid) and extracting their content using the following C# code snippet:
IEnumerable<Label> _collection = mainGrid.Children.OfType<Label>();
foreach(Label _control in _collection)
{
string _text = _control.Content.ToString();
// add you code here
}
The results can be placed in Resource file (for example, CSV) with your translation added. Other solution (more sophisticated) pertains to the creation of multilingual XML file, or the local multilingual database with all label contents added to the first field, translation added to the next fields, etc.
Hope this may help.

Compare word documents(.docx) with a document template(.dotx)

Is there any way I can compare a word document(.docx) with a document template(.dotx) generated in microsoft word.
I want to do this comparison programmatically using c#.
I want to compare both documents word to word so that I can determine to which template the document belongs. I don't just want to compare the size of both but I want to compare the contents also.
By this comparison I want get the following results.
From which document template the document is generated.
In the document template, I want to check that at which place a particular information is stored.
Say for example I want to search for the communication information of a person, then I want to traverse the document and check that At which position the template has the area/section for Address.(i.e. Top left corner, top center, In a paragraph, In body etc)
In same way I want to extract other information too, Like Link to other documents etc.
After getting those positions I want to get that Information from the .Docx file.
Say, If I found that the Address in the top-left and there are five links referring to other documents in five different paragraphs. Then what I want is to get the Address and save it to a variable. After that I want to replace those link contents from placeholders to Actual hyperLinks. i.e If a Link is referring to Doc-A then Instead of just showing a Plain text I want replace it with A hyperlink to Doc-A.
Any suggestions?
Thank You.
Your question is rather too vague and involved to give a really good answer, however...
To find out from which template a document was generated the object model provides the property: Document.AttachedTemplate with will return the full file name. This is certainly better than comparing word-by-word (which is also very time-consuming)
The Word object model also provides the method CompareDocuments (belongs to the Word.Application class). This will "highlight" differences in the text content of two documents.
Links will be found in the Document.Hyperlinks collection
Getting the position of things is a bit chancy with Word and it depends on what you really mean by "top-left", etc. Better would be to construct the templates using content controls, form fields and/or bookmarks so that you can uniquely identify important sections. However, Word does provide the Range.get_Information method that can return relative and absolute positions on the page if that's what you really want.

How to tell height of paragraph in OpenXML?

I'm generating an MS Word document from user data. The data is placed in a container which is serialized to XML, and the resulting XML is converted to OpenXML using XSLT. There are a few minor changes done programmatically in C# to generate the Word document, as they can't be done with XSLT.
There is a user requirement that an item be placed completely on one page without any associated data being split onto another page. Sometimes one item will fill up an entire page, and sometimes I can fit three or four items on one page (I need to insert a separator (horizontal rule) between items that fit on the same page.)
Is there a way to determine whether or not one item or OpenXML paragraph will fit entirely on the "current" page? This can be either via C# or XSLT, and I can work something out.
Unfortunately, the only way this can be reliably done is to actually render the output, including all of the font sizes, bolding, kerning and all that. Which means you have to do the pagination in Word, and then save it back to the OpenXML.

tabbing in C# resource file

How do i add a TAB (\t) to a string resource ?
"\tText" doesn't work
You have to explicitly add the tab in. The easiest way of doing this is probably to type out your string in notepad (with the tab explicitly set in place rather then using an escape character) and copy and paste the text into the resource editor.
You will have a similar problem with newlines, the easiest way of adding them in is to - again - add newlines in explicitly by using the shift-enter key combination.
You have two options that I am aware of:
Do a string replace after reading your resource string: s = s.Replace("\\t","\t");
Enter the escape sequence directly into your resource string at creation time by typing Alt-012 (I think that's tab) on the numeric keypad.
Articles on the same here and here.
Use the Alt Code for Tab (Alt + 009)
Newlines are added using Shift + Return.
1) Open up resources file in VS.
2) Put cursor where you want the Tab character
3) Hold down Alt key
4) Press 0, 0, 9 on the numeric keypad.
5) Let go alt key.
When you click off the resource string, you will see the tabs get removed from the display, rest assured they are still there. This can be verified by opening the Resources.Designer.cs and looking at the comment for the resource string and highlighting the area where the tab was inserted.
It's nearly six years since this thread was last modified, and the recommendation to use escapes still rules the day. For what it's worth, earlier today, I copied some text from a C# string constant into the resource string editor, and the tab got replaced by spaces. However, since the code expected to see the actual tab character, it threw an InvalidOperationException (my code, my exception!). Once again, I fell back to the tab, following the excellent instructions in the DevX article, "Another Way to Escape Sequences in .NET Resource Files," mentioned in the second citation in the accepted answer.
Moral: Don't count on the Windows Clipboard to faithfully copy your text.
Have you tried the XML tab character?
Sorry my tab character didn't show! Must have got eaten up by the browser.
\t does add an ascii tab but if you are displaying this in an html page you will not see that tab except in the page source. HTML doesn't render tabs or new-lines as non-breaking space. They all get reduced to 1 space character when displayed. Formatting HTML with whitespace is not recommended, that is what div with CSS or even Table are for. If you must add extra white space in HTML use the repeatedly but it will not be tab stop correct and will create a nightmare if you ever copy and paste.
Alternately you can display your string data in a read-only Text Area. This will preserve your string format. Without knowing the specifics of what you are trying to do with your string or how you are creating it these are the best suggestions I can give you.
You can also create a variable but the \t works inline.
string TAB = char.ConvertFromUtf32(9).ToString();

Categories

Resources