Sending Swedish and Chinese signs to Docx using OpenXML and RTF

Sending Swedish and Chinese signs to Docx using OpenXML and RTF - c#

Goal
Passing Swedish and Chinese signs to a DocX-file in a RTF format.[2]
Description
I need to dynamically generate a RTF-formatted string containing Swedish and Chinese signs and send it to an existing Docx-file. I have managed to handle the Swedish diaereses (åäö) but I can't manage to get the Chinese signs to be shown properly, instead they are shown as ????
private void buttonSendDiaeresesToDocx_Click(object sender, EventArgs e)
{
var desktop = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
var filename = #"SpecialCharactersInDocx.docx";
var filepath = Path.Combine(desktop, filename);
//Dynamic content fetched from the database.
var content = "This should be Swedish and Chinese signs -> åäö - 部件名称";
var rtfEncodedString = new StringBuilder();
rtfEncodedString.Append(#"{\rtf1\ansi{\fonttbl\f0\fswiss Helvetica;}\f0\pard ");
rtfEncodedString.Append(content);
rtfEncodedString.Append(#"\par}");
removeExistingFile(filepath);
createEmptyDocx(filepath);
addRtfToWordDocument(filepath, rtfEncodedString.ToString());
openDocx(filepath);
}
private void addRtfToWordDocument(string filepath, string rtfEncodedString)
{
//Implemented as suggested at
//http://stackoverflow.com/a/14861397/1997617
using (WordprocessingDocument doc = WordprocessingDocument.Open(filepath, true))
{
string altChunkId = "AltChunkId1";
MainDocumentPart mainDocPart = doc.MainDocumentPart;
AlternativeFormatImportPart chunk = mainDocPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.Rtf, altChunkId);
using (MemoryStream ms = new MemoryStream(Encoding.Default.GetBytes(rtfEncodedString)))
{
chunk.FeedData(ms);
}
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainDocPart.Document.Body.ReplaceChild(
altChunk, mainDocPart.Document.Body.Elements<Paragraph>().Last());
mainDocPart.Document.Save();
}
}
I have tried to use different encodings for the memory stream (Default, ASCII, UTF8, GB18030, ...) but none seams to work. I've also tried to convert the encoding of the rtfEncodedString variable before passing it to the addRtfToWordDocument method.
How do I make both the Swedish and the Chinese signs to show properly in the document?
Notes and references
The above code snippet is the part of my solution that I think is relevant for the question. The entire code sample can be downloaded at http://www.bjornlarsson.se/externals/SpecialCharactersInDocx02.zip
The RTF format is needed in the real world application since the content is to be shown as a table (with bold text) in the document.

You could use wordpad to create the rtf string for you. Open wordpad copy your content save to file. And then use a texteditor to read the rtf.
your rtf string then looks like this :
{\rtf1\ansi\ansicpg1252\deff0\nouicompat\deflang1031{\fonttbl{\f0\fnil Consolas;}{\f1\fnil\fcharset0 Consolas;}{\f2\fnil\fcharset134 SimSun;}{\f3\fnil\fcharset0 Calibri;}}
{\*\generator Riched20 10.0.10586}\viewkind4\uc1
\pard\sa200\sl276\slmult1\f0\fs19\lang7 This should be Swedish and Chinese signs -> \f1\'e5\'e4\'f6 - \f2\'b2\'bf\'bc\'fe\'c3\'fb\'b3\'c6\f3\fs22\par
}
maybe it helps.I tested the rtf string with your code and it works!
Dynamic generate rtf string via richtextbox :
private void buttonSendDiaeresesToDocx_Click(object sender, EventArgs e)
{
var desktop = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
var filename = #"SpecialCharactersInDocx.docx";
var filepath = Path.Combine(desktop, filename);
removeExistingFile(filepath);
createEmptyDocx(filepath);
rtfEncodedString = new StringBuilder();
string contentOriginal = "This should be Swedish and Chinese signs -> åäö - 部件名称";
string rtfStart =
"{\\rtf1\\ansi\\ansicpg1252\\deff0\\deflang1031{\\fonttbl{\\f0\\fnil\\fcharset0 Microsoft Sans Serif;}{\\f1\\fmodern\\fprq6\\fcharset134 SimSun;}}\r\n\\viewkind4\\uc1\\pard\\f0\\fs17 ";
RichTextBox rtfBox = new RichTextBox {Text = contentOriginal};
string content = rtfBox.Rtf;
content = content.Replace(rtfStart, "");
rtfEncodedString.Append(rtfStart);
rtfEncodedString.Append(content);
rtfEncodedString.Append(#"\par}");
addRtfToWordDocument(filepath, rtfEncodedString.ToString());
openDocx(filepath);
}

Related

Encoding issue with French language characters when creating RTF document using .NET/C#

The app is developed in .NET and reads an RTF document template that contains placeholders that require replacing with text currently stored in a SQL Server database. The app then saves the RTF doc with the substituted text. However, French characters read from the database, such as é are being displayed as Ã© in the RTF document.
The process is:
read the RTF doc
replace the placeholders with data from SQL Server db
save to new RTF doc
The key bits of the code I think are...
Read from RTF doc:
StringBuilder buffer;
using (StreamReader input = new StreamReader(pathToTemplate))
{
buffer = new StringBuilder(input.ReadToEnd());
}
Replace placeholder text with text from database:
buffer.Replace("$$placeholder$$", strFrenchCharsFromDb);
Save the edits as a new RTF doc:
byte[] fileBytes = System.Text.Encoding.UTF8.GetBytes(buffer.ToString());
File.WriteAllBytes(pathToNewRtfDoc, fileBytes);
When I debug buffer during "Save" the é character is present.
When I open the RTF after File.WriteAllBytes it contains Ã© instead.
I have tried specifying the encoding when creating the StreamReader but it was the same result.
i.e. using (StreamReader input = new StreamReader(pathToTemplate, Encoding.UTF8))

Apply the following method on the strFrenchCharsFromDb string before caling the Replace():
buffer.Replace("$$placeholder$$", ConvertNonAsciiToEscaped(strFrenchCharsFromDb));
The ConvertNonAsciiToEscaped() method implementation:
/// <param name="rtf">An RTF string that can contain non-ASCII characters and should be converted to correct format before loading to the RichTextBox control.</param>
/// <returns>The source RTF string with converted non ASCII to escaped characters.</returns>
public string ConvertNonAsciiToEscaped(string rtf)
{
var sb = new StringBuilder();
foreach (var c in rtf)
{
if (c <= 0x7f)
sb.Append(c);
else
sb.Append("\\u" + Convert.ToUInt32(c) + "?");
}
return sb.ToString();
}

I used the "Western European (ISO)" encoding method: System.Text.Encoding.GetEncoding("iso-8859-1")

add html content in existing docx file using openxml in C#

How do I add/append HTML content in an existing .docx file, using OpenXML in asp.net C#?
In an existing word file, I want to append the html content part.
For example:
In this example, I want to place "This is a Heading" inside a H1 tag.
Here its my code
protected void Button1_Click(object sender, EventArgs e)
{
try
{
using (WordprocessingDocument doc = WordprocessingDocument.Open(#"C:\Users\admin\Downloads\WordGenerator\WordGenerator\FTANJS.docx", true))
{
string altChunkId = "myId";
MainDocumentPart mainDocPart = doc.MainDocumentPart;
var run = new Run(new Text("test"));
var p = new Paragraph(new ParagraphProperties(new Justification() { Val = JustificationValues.Center }), run);
var body = mainDocPart.Document.Body;
body.Append(p);
MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes("<html><head></head><body><h1>HELLO</h1></body></html>"));
// Uncomment the following line to create an invalid word document.
// MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes("<h1>HELLO</h1>"));
// Create alternative format import part.
AlternativeFormatImportPart formatImportPart =
mainDocPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.Html, altChunkId);
//ms.Seek(0, SeekOrigin.Begin);
// Feed HTML data into format import part (chunk).
formatImportPart.FeedData(ms);
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainDocPart.Document.Body.Append(altChunk);
}
}
catch (Exception ex)
{
ex.ToString ();
}
}

Add HTML content as Chunk should work, and you are almost there.
If I understand the question properly, this code should work.
//insert html content to H1 tag
using(WordprocessingDocument fDocx = WordprocessingDocument.Open(sDocxFile,true))
{
string sChunkID = "myhtmlID";
AlternativeFormatImportPart oChunk = fDocx.MainDocumentPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Html, sChunkID);
using(FileStream fs = File.Open(sHtml,FileMode.OpenOrCreate))
{
oChunk.FeedData(fs);
}
AltChunk oAltChunk = new AltChunk();
oAltChunk.Id =sChunkID ;
//insert html to the tag of 'H1' and remove H1.
Body body = fDocx.MainDocumentPart.Document.Body;
Paragraph theParagraph = body.Descendants<Paragraph>().Where(p => p.InnerText == "H1").FirstOrDefault();
theParagraph.InsertAfterSelf<AltChunk>(oAltChunk);
theParagraph.Remove();
fDocx.MainDocumentPart.Document.Save();
}

The short answer is "You can't add HTML to a docx file".
Docx is an open format defined here. If you're using the Microsoft version they have a number of extensions.
In any case, the file contains XML, not HTML and you can't simply add HTML to a docx file. There are styles and formatting objects and pointers that all need to be updated.
If you need to modify a docx file and don't want to do a lot of research and a lot of coding, you'll need to find an existing library to work with.

How to parse a file into a list of string to use for auto-complete in a windows forms application in C#?

I am building an application in C# that has a textbox field. In this field, a user will write text and the text will autocomplete from a file found on a remote repository. I am using a library called SharpSVN and I am trying to find a method where I can fetch that file from the repository based on a certain path I provide, then parse the content into strings that will be added to the list in the autocomplete of the textbox mentioned previously.

There are two ways:
Download the file text using the repository url. If you want the file at a specific revision, try entering in "?r=12345" as to get the file's appearance at a specific revision number:
string fileText = new WebClient().DownloadFile("https://myrepo.com/myfile.txt", localFilename);
Or, you could also use SharpSVN, removing the revision options if you want the latest version:
public string GetFileContentsAsString(long revisionNumber)
{
return new StreamReader(GetFileContents(revisionNumber)).ReadToEnd();
}
private MemoryStream GetFileContents(long revisionNumber)
{
SvnRevision rev = new SvnRevision(revisionNumber);
MemoryStream stream = new MemoryStream();
using (SvnClient client = GetClient())
{
client.FileVersions(SvnTarget.FromUri(RemotePath), new SvnFileVersionsArgs() { Start = rev, End = rev }, (s, e) =>
{
e.WriteTo(stream);
});
}
stream.Position = 0;
return stream;
}
When you have the file as a text string, you can use .NET's String.Split() method to split the text into a list of lines using the '\n' line-break character as the delimiter:
string[] fileAsLines = fileText.Split(new char[] {'\n'});

Reading filecontents into webbrowser control strips away my umlauts

I'm using
private void infoLink_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)
{
string filename = "infoFiles\\Tastenkürzel.htm";
System.IO.StreamReader infoFile = new System.IO.StreamReader(filename);
string page = infoFile.ReadToEnd();
frmInfo infoForm = new frmInfo(page);
infoForm.Show();
}
to open an html file containing umlauts. When I open the file in a browser, it shows all the dots above letters I want it to, however, once I open the filecontents in the webbrowser control, the umlauts are replaced by little boxes.
Thanks in advance!

Solving the problem required adding an Encoding parameter.
System.IO.StreamReader infoFile = new System.IO.StreamReader(filename, Encoding.UTF8);

how can I put a content in a mergefield in docx

I'm developing a web application with asp.net and I have a file called Template.docx that works like a template to generate other reports. Inside this Template.docx I have some MergeFields (Title, CustomerName, Content, Footer, etc) to replace for some dynamic content in C#.
I would like to know, how can I put a content in a mergefield in docx ?
I don't know if MergeFields is the right way to do this or if there is another way. If you can suggest me, I appreciate!
PS: I have openxml referenced in my web application.
Edits:
private MemoryStream LoadFileIntoStream(string fileName)
{
MemoryStream memoryStream = new MemoryStream();
using (FileStream fileStream = File.OpenRead(fileName))
{
memoryStream.SetLength(fileStream.Length);
fileStream.Read(memoryStream.GetBuffer(), 0, (int) fileStream.Length);
memoryStream.Flush();
fileStream.Close();
}
return memoryStream;
}
public MemoryStream GenerateWord()
{
string templateDoc = "C:\\temp\\template.docx";
string reportFileName = "C:\\temp\\result.docx";
var reportStream = LoadFileIntoStream(templateDoc);
// Copy a new file name from template file
//File.Copy(templateDoc, reportFileName, true);
// Open the new Package
Package pkg = Package.Open(reportStream, FileMode.Open, FileAccess.ReadWrite);
// Specify the URI of the part to be read
Uri uri = new Uri("/word/document.xml", UriKind.Relative);
PackagePart part = pkg.GetPart(uri);
XmlDocument xmlMainXMLDoc = new XmlDocument();
xmlMainXMLDoc.Load(part.GetStream(FileMode.Open, FileAccess.Read));
// replace some keys inside xml (it will come from database, it's just a test)
xmlMainXMLDoc.InnerXml = xmlMainXMLDoc.InnerXml.Replace("field_customer", "My Customer Name");
xmlMainXMLDoc.InnerXml = xmlMainXMLDoc.InnerXml.Replace("field_title", "Report of Documents");
xmlMainXMLDoc.InnerXml = xmlMainXMLDoc.InnerXml.Replace("field_content", "Content of Document");
// Open the stream to write document
StreamWriter partWrt = new StreamWriter(part.GetStream(FileMode.Open, FileAccess.Write));
//doc.Save(partWrt);
xmlMainXMLDoc.Save(partWrt);
partWrt.Flush();
partWrt.Close();
reportStream.Flush();
pkg.Close();
return reportStream;
}
PS: When I convert MemoryStream to a file, I got a corrupted file. Thanks!

I know this is an old post, but I could not get the accepted answer to work for me. The project linked would not even compile (which someone has already commented in that link). Also, it seems to use other Nuget packages like WPFToolkit.
So I'm adding my answer here in case someone finds it useful. This only uses the OpenXML SDK 2.5 and also the WindowsBase v4. This works on MS Word 2010 and later.
string sourceFile = #"C:\Template.docx";
string targetFile = #"C:\Result.docx";
File.Copy(sourceFile, targetFile, true);
using (WordprocessingDocument document = WordprocessingDocument.Open(targetFile, true))
{
// If your sourceFile is a different type (e.g., .DOTX), you will need to change the target type like so:
document.ChangeDocumentType(WordprocessingDocumentType.Document);
// Get the MainPart of the document
MainDocumentPart mainPart = document.MainDocumentPart;
var mergeFields = mainPart.RootElement.Descendants<FieldCode>();
var mergeFieldName = "SenderFullName";
var replacementText = "John Smith";
ReplaceMergeFieldWithText(mergeFields, mergeFieldName, replacementText);
// Save the document
mainPart.Document.Save();
}
private void ReplaceMergeFieldWithText(IEnumerable<FieldCode> fields, string mergeFieldName, string replacementText)
{
var field = fields
.Where(f => f.InnerText.Contains(mergeFieldName))
.FirstOrDefault();
if (field != null)
{
// Get the Run that contains our FieldCode
// Then get the parent container of this Run
Run rFldCode = (Run)field.Parent;
// Get the three (3) other Runs that make up our merge field
Run rBegin = rFldCode.PreviousSibling<Run>();
Run rSep = rFldCode.NextSibling<Run>();
Run rText = rSep.NextSibling<Run>();
Run rEnd = rText.NextSibling<Run>();
// Get the Run that holds the Text element for our merge field
// Get the Text element and replace the text content
Text t = rText.GetFirstChild<Text>();
t.Text = replacementText;
// Remove all the four (4) Runs for our merge field
rFldCode.Remove();
rBegin.Remove();
rSep.Remove();
rEnd.Remove();
}
}
What the code above does is basically this:
Identify the 4 Runs that make up the merge field named "SenderFullName".
Identify the Run that contains the Text element for our merge field.
Remove the 4 Runs.
Update the text property of the Text element for our merge field.
UPDATE
For anyone interested, here is a simple static class I used to help me with replacing merge fields.

Frank Fajardo's answer was 99% of the way there for me, but it is important to note that MERGEFIELDS can be SimpleFields or FieldCodes.
In the case of SimpleFields, the text runs displayed to the user in the document are children of the SimpleField.
In the case of FieldCodes, the text runs shown to the user are between the runs containing FieldChars with the Separate and the End FieldCharValues. Occasionally, several text containing runs exist between the Separate and End Elements.
The code below deals with these problems. Further details of how to get all the MERGEFIELDS from the document, including the header and footer is available in a GitHub repository at https://github.com/mcshaz/SimPlanner/blob/master/SP.DTOs/Utilities/OpenXmlExtensions.cs
private static Run CreateSimpleTextRun(string text)
{
Run returnVar = new Run();
RunProperties runProp = new RunProperties();
runProp.Append(new NoProof());
returnVar.Append(runProp);
returnVar.Append(new Text() { Text = text });
return returnVar;
}
private static void InsertMergeFieldText(OpenXmlElement field, string replacementText)
{
var sf = field as SimpleField;
if (sf != null)
{
var textChildren = sf.Descendants<Text>();
textChildren.First().Text = replacementText;
foreach (var others in textChildren.Skip(1))
{
others.Remove();
}
}
else
{
var runs = GetAssociatedRuns((FieldCode)field);
var rEnd = runs[runs.Count - 1];
foreach (var r in runs
.SkipWhile(r => !r.ContainsCharType(FieldCharValues.Separate))
.Skip(1)
.TakeWhile(r=>r!= rEnd))
{
r.Remove();
}
rEnd.InsertBeforeSelf(CreateSimpleTextRun(replacementText));
}
}
private static IList<Run> GetAssociatedRuns(FieldCode fieldCode)
{
Run rFieldCode = (Run)fieldCode.Parent;
Run rBegin = rFieldCode.PreviousSibling<Run>();
Run rCurrent = rFieldCode.NextSibling<Run>();
var runs = new List<Run>(new[] { rBegin, rCurrent });
while (!rCurrent.ContainsCharType(FieldCharValues.End))
{
rCurrent = rCurrent.NextSibling<Run>();
runs.Add(rCurrent);
};
return runs;
}
private static bool ContainsCharType(this Run run, FieldCharValues fieldCharType)
{
var fc = run.GetFirstChild<FieldChar>();
return fc == null
? false
: fc.FieldCharType.Value == fieldCharType;
}

You could try http://www.codeproject.com/KB/office/Fill_Mergefields.aspx which uses the Open XML SDK to do this.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Sending Swedish and Chinese signs to Docx using OpenXML and RTF - c#

Related

Encoding issue with French language characters when creating RTF document using .NET/C#

add html content in existing docx file using openxml in C#

How to parse a file into a list of string to use for auto-complete in a windows forms application in C#?

Reading filecontents into webbrowser control strips away my umlauts

how can I put a content in a mergefield in docx

Categories

Resources