Using OpenXML to replace text in the document

Using OpenXML to replace text in the document - c#

I open document and copy to the stream.
How I can replace some text in document before stream?
//wordTemplate - var with path to my word template
byte[] result = null;
byte[] templateBytes = System.IO.File.ReadAllBytes(wordTemplate);
using (MemoryStream templateStream = new MemoryStream())
{
templateStream.Write(templateBytes, 0, (int)templateBytes.Length);
using (WordprocessingDocument doc = WordprocessingDocument.Open(templateStream, true))
{
MainDocumentPart mainPart = doc.MainDocumentPart;
mainPart.Document.Save();
templateStream.Position = 0;
using (MemoryStream memoryStream = new MemoryStream())
{
templateStream.CopyTo(memoryStream);
result = memoryStream.ToArray();
}
}
}

Instead of calling File.ReadAllBytes(), why not call File.ReadAllLines(string) instead? It gives you an array of strings, which you can then can search and replace to your hearts delight.

Related

Conver itext7 PdfDocument to byte array in c#

I am using itext7 pdfhtml (4.0.3) to convert Html to pdf in memory. Below method is taking html in memory and returning PdfDocument object of itext7. I need to convert that PdfDocument object to byte array or stream.
Please let me know how we can achieve that.
private iText.Kernel.Pdf.PdfDocument CreatePdf( string html)
{
byte[] bytes = Encoding.ASCII.GetBytes(html);
ConverterProperties properties = new ConverterProperties();
properties.SetBaseUri(path);
MemoryStream myMemoryStream = new MemoryStream(bytes);
PdfWriter writer = new(myMemoryStream);
iText.Kernel.Pdf.PdfDocument pdf = new iText.Kernel.Pdf.PdfDocument(writer);
pdf.SetDefaultPageSize(PageSize.A4);
pdf.SetTagged();
HtmlConverter.ConvertToDocument(html,pdf,properties);
return pdf;
}

to #mkl's point, you're kind of overdoing it. Here's a simple example:
var html = "<h1>hi mom</h1>";
byte[] result;
using (var memoryStream = new MemoryStream())
{
var pdf = new PdfDocument(new PdfWriter(memoryStream));
pdf.SetDefaultPageSize(PageSize.A4);
pdf.SetTagged();
HtmlConverter.ConvertToPdf(html, pdf, new ConverterProperties());
result = memoryStream.ToArray();
}
File.WriteAllBytes(#"/tmp/file.pdf", result);
the memoryStream will have your in memory representation of your conversion. I've added the WriteAllBytes bits just so you can see for yourself.
Another note, if you do not require setting any PdfDocument properties, you can use an even simpler version:
var html = "<h1>hi mom</h1>";
byte[] result;
using (var memoryStream = new MemoryStream())
{
HtmlConverter.ConvertToPdf(html, memoryStream);
result = memoryStream.ToArray();
}
File.WriteAllBytes(#"/tmp/file.pdf", result);

Convert XML to word template using Microsoft.Office.Interop.Word

I want to convert XML to word template and when I used the code below when I open the word document it show all test like
(PK ! #qu é  [Content_Types].xml ¢(  Í•ËNÃ0E÷HüCämÕ¸ejÚ%T¢|€‰'©…c[÷‘¿gÒ´¡Ò ÚÒn"93÷Þck¤Œ–…ŽæàQY“°~Üc˜ÔJeò„½N»7,Â ŒÚHX ÈFÃË‹Á¤t€©
&l‚»åÓ)cëÀP%³¾Ž>çN¤ï" ~Õë]óÔš &tCåÁ†ƒ{ÈÄL‡èaI¿kYtW7VY Îi•Š#u>7ò[Jw“rÕƒSå°C)
my code like
string rootPath = #"xmlpath";
string templateDocument = #"mytemplateword documentpath";
string outputDocument = #"myoutputdocumentpath";
// word.Document wordDoc = wordApp.doc
// using (WordprocessingDocument doc =
// WordprocessingDocument.Create(xmlDataFile, WordprocessingDocumentType.Document,true))
//{
// MainDocumentPart mainPart = doc.AddMainDocumentPart();
// mainPart.Document = new Document();
// Body body = mainPart.Document.AppendChild(new Body());
// SectionProperties props = new SectionProperties();
// body.AppendChild(props);
//}
//Document doc = new Document();
string result = "";
MemoryStream mStream = new MemoryStream();
XmlTextWriter writer = new XmlTextWriter(mStream, System.Text.Encoding.Unicode);
XmlDocument document = new XmlDocument();
// XmlNodeList PatientFirst = xmlDoc.GetElementsByTagName("PatientFirst");
// XmlNodeList PatientSignatureImg = xmlDoc.GetElementsByTagName("PatientSignatureImg");
byte[] byteArray = System.IO.File.ReadAllBytes(templateDocument);
using (MemoryStream mem = new MemoryStream())
{
mem.Write(byteArray, 0, (int)byteArray.Length);
using (WordprocessingDocument Doc = WordprocessingDocument.Open(mem, true))
{
using (StreamReader reader = new StreamReader(Doc.MainDocumentPart.GetStream(FileMode.Create)))
{
documentText = reader.ReadToEnd();
}
using (StreamWriter docWriter = new StreamWriter(Doc.MainDocumentPart.GetStream(FileMode.Create)))
{
docWriter.Write(documentText);
}
}
System.IO.File.WriteAllBytes(outputDocument, mem.ToArray());

iTextSharp (version 4.1.6) - add text/table at top of existing PDF

I have a pdf document (created by iTextSharp - free version 4.1.6) and I want to add text / table at the top of this pdf. I have tried to create two memory streams from iTextSharp Documents and combine them to one, see my code below. But the new PDF file cannot be opened. Any ideas what I am doing wrong? Any other ideas to add text / table at the top of an existing PDF? Thanks in advance!
public void CreateTestPDF(string _pathOfOriginalPDF, string _pathOfModifiedPDF)
{
string oldFile = _pathOfOriginalPDF;
string newFile = pathOfModifiedPDF;
byte[] bytesHeader;
byte[] bytesBody;
byte[] bytesCombined;
using (MemoryStream ms = new MemoryStream())
{
Document doc = new Document();
doc.Open();
doc.Add(new Paragraph("This is my header paragraph"));
if (doc.IsOpen())
{
doc.Close();
}
bytesHeader = ms.ToArray();
}
using (MemoryStream ms = new MemoryStream())
{
Document doc = new Document();
//doc.Open();
PdfWriter writer = PdfWriter.GetInstance(doc, new FileStream(oldFile, FileMode.Create));
if (doc.IsOpen())
{
doc.Close();
}
bytesBody = ms.ToArray();
}
IEnumerable<byte> iCombined = bytesHeader.Concat(bytesBody);
bytesCombined = iCombined.ToArray();
string testFile = _pathOfModifiedPDF;
using (FileStream fs = File.Create(testFile))
{
fs.Write(bytesBody, 0, (int)bytesBody.Length);
}
}

Convert docx to byte[] and save it to the disk using File.WriteAllBytes

I am reading a docx file using DocumentFormat.OpenXml lib.
I am manipulating the file and need to write it to the disk.
Doing this using the openxml lib is no brainer, the problem is that I need to pass the file content (byte[]) to a different API in my code and this API is handling the save operation.
This api is using File.WriteAllBytes. When I try to save my file via File.WriteAllBytes I get XML inside the doc instead of the doc read content.
How can I extract the byte[] from the doc and save it to the disc using File.WriteAllBytes
var path = "path/to/doc.docx";
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(path, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
docText = new Regex("BBB").Replace(docText, "CCC!");
// here i will manipuldate docText
MemoryStream ms = new MemoryStream();
using (WordprocessingDocument wordDocument =
WordprocessingDocument.Create(ms , WordprocessingDocumentType.Document, true))
{
MainDocumentPart mainPart = wordDocument.AddMainDocumentPart();
Body body = new Body(new Paragraph(new Run(new Text(docText))));
mainPart.Document = new Document(body);
}
File.WriteAllBytes("path/to/cloned.docx", ms.ToArray());
}

this should do the trick:
(tested with SampleDoc.docx from Github)
var path = #"path/to/doc.docx";
byte[] byteArray = File.ReadAllBytes(path);
using (MemoryStream stream = new MemoryStream())
{
stream.Write(byteArray, 0, (int)byteArray.Length);
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(stream, true))
{
Body body = wordDoc.MainDocumentPart.Document.Body;
foreach (var text in body.Descendants<Text>())
{
text.Text = text.Text.Replace("BBB", "CCC!");
}
wordDoc.Close();
}
File.WriteAllBytes(path+".mod.docx", stream.ToArray());
}
debug output:

It looks like you want to manipulate the raw XML, so you actually shouldn't use DocumentFormat.OpenXml at all--just treat your docx like the raw ZIP file that it is. Here is some sample code:
using System.IO;
using System.IO.Compression;
public static byte[] Change(string path)
{
// Make a temporary directory
var myTempDir = new DirectoryInfo(Path.Join(Path.GetTempPath(), Path.GetRandomFileName() ));
myTempDir.Create();
// Extract all the XML files in the docx to that temporary directory
using (ZipArchive zipArchive = ZipFile.OpenRead(path))
zipArchive.ExtractToDirectory(myTempDir.FullName);
// Read in the main document XML
FileInfo docFile = new FileInfo(Path.Join(myTempDir.FullName, "word", "document.xml"));
string rawXML = File.ReadAllText(docFile.FullName);
// Manipulate it-- warning, this could break the whole thing
rawXML = rawXML.Replace("winter", "spring");
// Save the manipulated xml back over the old file
docFile.Delete();
File.WriteAllText(docFile.FullName, rawXML);
// Zip our temporary directory back into a docx file
FileInfo tempFile = new FileInfo(Path.GetTempFileName());
ZipFile.CreateFromDirectory(myTempDir.FullName, tempFile.FullName);
// Read the raw bytes in from our new file
byte[] rawBytes = File.ReadAllBytes(tempFile.FullName);
return rawBytes;
}
You might want to delete all those temp files, too--but I'll leave that part to you.

Html Text Content to Word using OpenXml

I have a rich text box which contains html formatted text as well as we can insert a copied images. I tried with AlternativeFormatImportPart and AltChunk method. It's generating the document but getting the below error. Please let me know what am I missing here.
MemoryStream ms;// = new MemoryStream(new UTF8Encoding(true).GetPreamble().Concat(Encoding.UTF8.GetBytes(h)).ToArray());
ms = new MemoryStream(HtmlToWord(fileContent));
//MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes(h));
// Create alternative format import part.
AlternativeFormatImportPart chunk =
mainDocPart.AddAlternativeFormatImportPart(
"application/xhtml+xml", altChunkId);
chunk.FeedData(ms);
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
public static byte[] HtmlToWord(String html)
{
const string filename = "test.docx";
if (File.Exists(filename)) File.Delete(filename);
var doc = new Document();
using (MemoryStream generatedDocument = new MemoryStream())
{
using (WordprocessingDocument package = WordprocessingDocument.Create(
generatedDocument, WordprocessingDocumentType.Document))
{
MainDocumentPart mainPart = package.MainDocumentPart;
if (mainPart == null)
{
mainPart = package.AddMainDocumentPart();
new Document(new Body()).Save(mainPart);
}
HtmlConverter converter = new HtmlConverter(mainPart);
converter.ExcludeLinkAnchor = true;
converter.RefreshStyles();
converter.ImageProcessing = ImageProcessing.AutomaticDownload;
//converter.BaseImageUrl = new Uri(domainNameURL + "Images/");
converter.ConsiderDivAsParagraph = false;
Body body = mainPart.Document.Body;
var paragraphs = converter.Parse(html);
for (int i = 0; i < paragraphs.Count; i++)
{
body.Append(paragraphs[i]);
}
mainPart.Document.Save();
}
return generatedDocument.ToArray();
}
}

There are some issues in AlternativeFormatImportPart with MemoryStream, document is not getting formatted well. So followed an alternate approach, using HtmlToWord method saved the html content into word and read the file content using FileStream and feed the AlternativeFormatImportPart.
string docFileName;
HtmlToWord(fileContent, out docFileName);
FileStream fileStream = File.Open(docFileName, FileMode.Open);
// Create alternative format import part.
AlternativeFormatImportPart chunk =mainDocPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML, altChunkId);
chunk.FeedData(fileStream);
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Using OpenXML to replace text in the document - c#

Instead of calling File.ReadAllBytes(), why not call File.ReadAllLines(string) instead? It gives you an array of strings, which you can then can search and replace to your hearts delight.

Related

Conver itext7 PdfDocument to byte array in c#

Convert XML to word template using Microsoft.Office.Interop.Word

iTextSharp (version 4.1.6) - add text/table at top of existing PDF

Convert docx to byte[] and save it to the disk using File.WriteAllBytes

Html Text Content to Word using OpenXml

Categories

Resources