Changing Word Document XML - c#

What I am doing is trying to change the value of a Microsoft Office Word documents XML and save it as a new file. I know that there are SDK's that I could use to make this easier but the project I am tasked with maintaining is doing things this way and I was told I had to as well.
I have a basic test document with two placeholders mapped to the following XML:
<root>
<element>
Fubar
</element>
<second>
This is the second placeholder
</second>
</root>
In my test project I have the following:
string strRelRoot = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument";
//the word template
byte[] buffer = File.ReadAllBytes("dev.docx");
MemoryStream stream = new MemoryStream(buffer, true);
Package package = Package.Open(stream, FileMode.Open, FileAccess.ReadWrite);
//get the document relationship
PackageRelationshipCollection pkgrcOfficeDocument = package.GetRelationshipsByType(strRelRoot);
//iterate through relationship
foreach (PackageRelationship pkgr in pkgrcOfficeDocument)
{
if (pkgr.SourceUri.OriginalString == "/")
{
//uri for the custom xml
Uri uriData = new Uri("/customXML/item1.xml", UriKind.Relative);
//delete the existing xml if it exists
if (package.PartExists(uriData))
{
// Delete template "/customXML/item1.xml" part
package.DeletePart(uriData);
}
PackagePart pkgprtData = package.CreatePart(uriData, "application/xml");
//hard coded test data
string xml = #"<root>
<element>
Changed
</element>
<second>
The second placeholder changed
</second>
</root>";
Stream fromStream = pkgprtData.GetStream();
//write the string
fromStream.Write(Encoding.UTF8.GetBytes(xml),0,xml.Length);
//destination file
Stream dest = File.Create("test.docx");
//write to the destination file
for (int a = fromStream.ReadByte(); a != -1; a = fromStream.ReadByte())
{
dest.WriteByte((byte)a);
}
}
}
What is happening right now is the file test.docx is being created but it is a blank document. I'm not sure why this is happening. Any suggestions anyone could offer on this approach and/or what I am doing incorrectly would be very much appreciated. Thanks much!

After your fromStream.Write call, the stream pointer is positioned after the data you've just written. So your first call to fromStream.ReadByte is already at the end of the stream, and you read (and write) nothing.
You need to either Seek to the beginning of the stream after writing (if the stream returned by the package supports seeking), or close fromStream (to ensure the data you've written is flushed) and reopen it for reading.
fromStream.Seek(0L, SeekOrigin.Begin);

Related

XSLT transform in C# - how to get valid URLs for included documents?

I'm transforming XML using XSLT sheet. The sheet consists of several files, which are included like this:
<xsl:include href="tokens.xsl"/>
<xsl:include href="glayout.xsl"/>
<xsl:include href="scripts.xsl"/>
<xsl:include href="tables.xsl"/>
<xsl:include href="entities.xsl"/>
<xsl:include href="cmarkup.xsl"/>
The transform code looks like following:
// Load text
var reader = XmlReader.Create(new StringReader(text));
// Load transform
XslCompiledTransform myXslTrans = new XslCompiledTransform();
using (var fs = new FileStream(result.FileName, FileMode.Open, FileAccess.Read))
{
var xmlReader = XmlReader.Create(fs);
myXslTrans.Load(xmlReader);
}
// Perform transformation
MemoryStream ms = new MemoryStream();
XmlTextWriter writer = new XmlTextWriter(ms, Encoding.UTF8);
myXslTrans.Transform(reader, null, writer);
// Recover result to string
ms.Seek(0, SeekOrigin.Begin);
var textReader = new StreamReader(ms);
string transformed = textReader.ReadToEnd();
Transform fails on the include lines. I found out, that I may provide my own resolver to provide missing documents, but since their URLs are relative, I'm getting them appended to current application's folder, like:
D:\Dokumenty\Dev\VS\Dev.Editor\Dev.Editor\bin\Debug\tokens.xsl
There are two dirty solutions:
Cut off the application path to retrieve only file name, then search for the file in original sheet's folder (but what if file had a subfolder, like: Include/tokens.xsl?
Temporarily set current directory to the one in which main sheet resides:
var dir = System.IO.Directory.GetCurrentDirectory();
try
{
System.IO.Directory.SetCurrentDirectory(System.IO.Path.GetDirectoryName(result.FileName));
myXslTrans.Load(xmlReader, null, resolver);
}
finally
{
System.IO.Directory.SetCurrentDirectory(dir);
}
But I don't like this solution either. Is there a way to force the XslCompiledTransform to pass the original URLs to the resolver? Or possibly other, more generic solution to this problem?
If you have a file name or URI with the main stylesheet module then use the overload of the Load method taking a string (https://learn.microsoft.com/en-us/dotnet/api/system.xml.xsl.xslcompiledtransform.load?view=netframework-4.8#System_Xml_Xsl_XslCompiledTransform_Load_System_String_) with e.g. myXslTrans.Load(result.FileName).

Unable to save changes to XML document stored in Sharepoint 2010 Document Library

I am working on a project that requires all SQL connection and query information to be stored in XML files. To make my project configurable, I am trying to create a means to let the user configure his sql connection string information (datasource, catalog, username and password) via a series of text boxes. This input will then be saved to the appropriate node within the SQL document.
I can get the current information from the XML file, and display that information within text boxes for the user's review and correction, but I'm encountering an error when it comes time to save the changes.
Here is the code I'm using to update and save the xml document.
protected void submitBtn_Click(object sender, EventArgs e)
{
SPFile file = methods.web.GetFile("MyXMLFile.xml");
myDoc = new XmlDocument();
byte[] bites = file.OpenBinary();
Stream strm1 = new MemoryStream(bites);
myDoc.Load(strm1);
XmlNode node;
node = myDoc.DocumentElement;
foreach (XmlNode node1 in node.ChildNodes)
{
foreach (XmlNode node2 in node1.ChildNodes)
{
if (node2.Name == "name1")
{
if (node2.InnerText != box1.Text)
{
}
}
if (node2.Name == "name2")
{
if (node2.InnerText != box2.Text)
{
}
}
if (node2.Name == "name3")
{
if (node2.InnerText != box3.Text)
{
node2.InnerText = box3.Text;
}
}
if (node2.Name == "name4")
{
if (node2.InnerText != box4.Text)
{
}
}
}
}
myDoc.Save(strm1);
}
Most of the conditionals are empty at this point because I'm still testing.
The code works great until the last line, as I said. At that point, I get the error "Memory Stream is not expandable." I understand that using a memory stream to update a stored file is incorrect, but I can't figure out the right way to do this.
I've tried to implement the solution given in the similar question at Memory stream is not expandable but that situation is different from mine and so the implementation makes no sense to me. Any clarification would be greatly appreciated.
Using the MemoryStream constructor that takes a byte array as an argument creates a non-resizable instance of a MemoryStream. Since you are making changes to the file (and therefore the underlying bytes), you need a resizable MemoryStream. This can be accomplished by using the parameterless constructor of the MemoryStream class and writing the byte array into the MemoryStream.
Try this:
SPFile file = methods.web.GetFile("MyXMLFile.xml");
myDoc = new XmlDocument();
byte[] bites = file.OpenBinary();
using(MemoryStream strm1 = new MemoryStream()){
strm1.Write(bites, 0, (int)bites.Length);
strm1.Position = 0;
myDoc.Load(strm1);
// all of your edits to the file here
strm1.Position = 0;
// save the file back to disk
using(var fs = new FileStream("FILEPATH",FileMode.Create,FileAccess.ReadWrite)){
myDoc.Save(fs);
}
}
To get the FILEPATH for a Sharepoint file, it'd be something along these lines (I don't have a Sharepoint development environment set up right now):
SPFile file = methods.web.GetFile("MyXMLFile.xml")
var filepath = file.ParentFolder.ServerRelativeUrl + "\\" + file.Name;
Or it might be easier to just use the SaveBinary method of the SPFile class like this:
// same code from above
// all of your edits to the file here
strm1.Position = 0;
// don't use a FileStream, just SaveBinary
file.SaveBinary(strm1);
I didn't test this code, but I've used it in Sharepoint solutions to modify XML (mainly OpenXML) documents in Sharepoint lists. Read this blogpost for more information
You could look into using the XDocument class instead of XmlDocument class.
http://msdn.microsoft.com/en-us/library/system.xml.linq.xdocument.aspx
I prefer it because of the simplicity and it eliminates having to use Memory Stream.
Edit: You can append to the file like this:
XDocument doc = XDocument.Load('filePath');
doc.Root.Add(
new XElement("An Element Name",
new XAttribute("An Attribute", "Some Value"),
new XElement("Nested Element", "Inner Text"))
);
doc.Save(filePath);
Or you can search for an element and update like this:
doc.Root.Elements("The element").First(m =>
m.Attribute("An Attribute").Value == "Some value to match").SetElementValue(
"The element to change", "Value to set element to");
doc.Save('filePath');

Open XML adding images in multiple picture content controls

ok, old question is gone and this is new one:
#JasonPlutext, we decided to do it the way you suggested. custom xml looks like:
<DATA>
<BLOCK>
<FNAME>Test</FNAME>
<LNAME>Test1</LNAME>
</BLOCK>
<PICTURE>
<SIG> domain\username</SIG>
</PICTURE>
</DATA>
Text controls are binded: $rowBlock.FNAME, $rowBlock.LNAME and picture content control is $rowPicture.SIG.
text from xml is displayed, but there is no picture...
Picture is returned by ws (web service input parameter is domain\username from <sig> and picture is returned as byte[]).
//this is part of code where dealing with picture content control
picture[] pic = getPic("domain\username");
Paragraph tP = new Paragraph();
ParagraphProperties tParagraphProperties =
pControl.Descendants<ParagraphProperties>).FirstOrDefault();
tP.ParagraphProperties = (ParagraphProperties)tParagraphProperties.Clone();
...?...
Please suggest what to do next and how to bind picture?
thx
You could consider a slightly different approach.
You can bind a picture content control to an element in a custom xml part which contains a base64 encoded image.
If you do it this way, you can rely on Word to resolve the binding (ie update the image on the document surface with the one in the custom xml part). Or you can mimic what Word does yourself; docx4j.NET contains code to do that for you.
Doing it this way becomes a matter of just updating the custom xml part with the images you want.
Jason, i'm injecting base64 encoded image content as you said, but there is still no picture. in customXml folder of zip document, in item3.xml there is a base64 string inside tag, but in media folder there is only default image. don't know what's wrong... my procedure is:
//first, searching for drawing inside current processing control
`Drawing tDraw = pControl.Descendants<Drawing>().FirstOrDefault();
//if there is a drawing element, then clone control
OpenXmlElement tClone = (OpenXmlElement)pControl.Clone();
//then call method:
private static void insertPicture(OpenXmlElement pControl)
{
//WordprocessingDocument wordDoc = WordprocessingDocument.Open(dokument, true);
MainDocumentPart mainPart = dokument.MainDocumentPart;
CustomXmlPart customPart = mainPart.CustomXmlParts.FirstOrDefault();
//convert image into string
string picName = #"c:\temp\picasso.png";
System.IO.FileStream fileStream = System.IO.File.Open(picName, System.IO.FileMode.Open);
System.IO.BinaryReader br = new System.IO.BinaryReader(fileStream);
byte[] byteArea;
byteArea = br.ReadBytes(System.Convert.ToInt32(fileStream.Length));
string picString = System.Convert.ToBase64String(byteArea);
//Load the XML template
string DataString = iData["DATA"].ToString();
//Properties.Resources.XMLData;
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(DataString);
//change the value
XmlNodeList xmlNode = xmlDoc.GetElementsByTagName("picture");
xmlNode[0].InnerText = picString;
//write the custom xml data into the customxmlpart
System.Xml.XmlTextWriter writer = new System.Xml.XmlTextWriter(customPart.GetStream(System.IO.FileMode.Create), System.Text.Encoding.UTF8);
writer.WriteRaw(xmlDoc.InnerXml);
writer.Flush();
writer.Close();
fileStream.Close();
br.Close();
mainPart.Document.Save();
//dokument.Close();
}
then append control to document
OpenXmlElement tC1 = pControl;
IEnumerable<Run> tEl1 = tClone.Descendants<Run>();
if (tEl1.Count() != 0)
{
foreach (OpenXmlElement tElement in tEl1.Reverse())
{
OpenXmlElement tClone1 = (OpenXmlElement)tElement.Clone();
tC1.InsertBeforeSelf(tClone1);
tC1 = tClone1;
}
}`

Overwrite file but still read content

I'm trying to find the most reasonable way to open a file, modify its content and then write it back to file.
If I have the following "MyFile.xml"
<?xml version="1.0" encoding="utf-8"?>
<node>
<data>this is my data which is long</data>
</node>
And then want to modify it according to this:
private static void Main(string[] args)
{
using (FileStream stream = new FileStream("Myfile.xml", FileMode.Open))
{
XDocument doc = XDocument.Load(stream);
doc.Descendants("data").First().Value = "less data";
stream.Position = 0;
doc.Save(stream);
}
}
I get the following result. Note that, since the total file length is less than before I get incorrect data at the ending.
<?xml version="1.0" encoding="utf-8"?>
<node>
<data>less data</data>
</node>/node>
I guess I could use File.ReadAll* and File.WriteAll* but that would mean two File openings. Isn't there some way to say "I want to open this file, read its data and when I save delete the old content" without closing and reopening the file? Other solutions that I have found include FileMode.Truncate, but that would imply that I cannot read the content.
You'll have to use FileStream.SetLength like this:
stream.SetLength(stream.Position);
After you have finished writing.
Of course, assuming that the position is at the end of the written data.
Why do you read the file into a filestream first?
You can do the following:
private static void Main(string[] args]
{
string path = "MyFile.xml";
XDocument doc = XDocument.Load(path);
// Check if the root-Node is not null and other validation-stuff
doc.Descendants("data").First().Value = "less data";
doc.Save(path);
}
The problem with the stream is, that you can either read or write.
I've read, that with the .net-Framework 4.5 it's also possible to read and write on a stream, but haven't tried it yet.

Lost XML file declaration using DataSet.WriteXml(Stream)

I had a Dataset with some data in it. When I tried to write this DataSet into a file, everything was OK. But When I tried to write it into a MemoryStream, the XML file declaration was lost.
The code looks like:
DataSet dSet = new DataSet();
//load schema, fill data in
dSet.WriteXML("testFile.xml");
MemoryStream stream = new MemoryStream();
dSet.WriteXML(stream);
stream.Seek(0,SeekOrigin.Begin);
When I opened file testFile.xml, I got:
<?xml version="1.0" standalone="yes"?>
//balabala
But When I open the stream with StreamReader, I only got:
//balabala
Somebody said I can insert XML file declaration in my stream manually. It works but seems so ugly. Do you know why it dropped the first line and any more simple solution?
It wasn't dropped. Simply not included. Though it is highly recommend the xml declaration is not a required element of the xml specification.
http://msdn.microsoft.com/en-us/library/ms256048(VS.85).aspx
You can use XmlWriter.WriteStartDocument to include the xml declaration in the stream like so:
MemoryStream stream = new MemoryStream();
var writer = XmlWriter.Create(stream);
writer.WriteStartDocument(true);
dSet.WriteXML(stream);
I try your solution with DataTable and don't work correctly.
using (MemoryStream stream = new MemoryStream()) {
using (XmlTextWriter writer = new XmlTextWriter(stream, Encoding.UTF8)) {
writer.WriteStartDocument(); //<?xml version="1.0" encoding="utf-8"?>
writer.WriteRaw("\r\n"); //endline
writer.Flush(); //Write immediately the stream
dTable.WriteXml(stream);
}
}
If you disassemble the 2.0 code you'll see that the WriteXml method that takes a file name explictly writes out the declaration (XmlWriter.WriteStartDocument) but the WriteXml methods that take a stream or writer do not.

Categories

Resources