Accessing a .docx file's XML programatically? - c#

If you take a .docx file, rename it to .zip, and unzip it, you can view its .xml files. I'm building a program to programmatically inspect these XML properties (no existing API seems to suffice as our company is using a 3rd party program that attaches custom XML to files, and that program does not have an API).
Is there a clean way to access this XML without programmatically saving copies of files as .zip files, opening them, taking out only the XML and then deleting the rest?

use openxml sdk to fetch all the xml elements
WordprocessingDocument document = WordprocessingDocument.Open(this.FilePath, true);
MainDocumentPart mainPart = document.MainDocumentPart;
List<OpenXmlElement> ParagraphElements = new List<OpenXmlElement>();
foreach (var i in mainPart.Document.ChildElements.FirstOrDefault().ChildElements)
{
ParagraphElements.Add(i);
}
Here is your complete solution,
From ParagraphElements all XML elements can be retrieved.
This's easy way to access XML elements present in it.

Have you tried the Open XML SDK for Office?
Allows you to access the xml files inside .docx files.

Related

How to convert Visio document .xml to csv file c#

I have multiple xml files, this files generated from convert .vsdx "Visio File" to .xml file, Now i want to convert the generated files from .xml to .csv files.
My problem the files has hundreds lines and the xml shape tag can't be followed to extract csv file using xml tag.
I work with this tool but the problem the out structure very complex
there is any way to make that ?
Normally one solves this with XSLT. If you want to create a csv file programmatically I would query the xml file with Linq to Xml append each record as a csv formatted line into a StringBuilder instance and finally write the StringBuilders content into the target file.
I haven't worked with Visio based XML files but given that the XML'ishness is alike across any source (which it should be), I'd go LINQ2XML or, depending on what you feel convenient with read in the stuff as a XDocument (avoid XmlDocument, as it's older and obsolete) and then parse it to a String.
It depends a bit on how complex files are and how invasive operations that you'll need to carry out.
As for the tool you mention, I haven't seen it before but it could be a better idea not to use it as it strikes me as a bit outdated. However, I only glanced at it without scrutinizing. If you only need to see the data for yourself (human based analysis of the contents), you might perhaps use a decent text editor with some appropriate plugin (such as Notepad++ and its XML add-on).

Cannot embed an openXML document inside another using OpenXML SDK

I'm trying to embed an OpenXML word document inside of another OpenXML word document and it's creating corrupted files. I've already review this post
http://blogs.msdn.com/b/brian_jones/archive/2009/06/30/embedding-an-open-xml-file-in-another-open-xml-file.aspx and this stack overflow post Embedding an OpenXML document within another OpenXml document and I think I'm following the instructions, but I'm not getting the same results.
My code looks something like this.
foreach (KeyValuePair<string, byte[]> docKV in files)
{
var embeddedPackagePart = document.MainDocumentPart.AddNewPart<EmbeddedPackagePart>("application/vnd.openxmlformats-officedocument.wordprocessingml.document", "rfp" + docKV.key);
var stream = new MemoryStream(docKV.Value);
embeddedPackagePart.FeedData(stream);
stream.Close();
}
So essentially I'm looping through a list of files stored in a dictionary and embedding each file. I've removed the code that creates the actual OpenXML markup to link to the embedded object to try and narrow down the error. As far as I can tell the actual embedding is what creates the issue.
I've also opened the generated document using the OpenXML SDK Productivity tool, and it is missing the embeddings section that I see on documents where I manually embedded a file.
Any thoughts on what I'm doing wrong?
You can read Merging word processing documents. Sample code is available too.
I figured out the issue. I was not calling WordprocessingDocument.Close before disposing of the WordprocessingDocument. If you are adding new document parts you have to invoke Close() for those to get written to the underlying stream.

Embedding a word file as a resource

I am trying to access a word file from my c# code. I want to embed it as part of a project so that when i call a word doc object to be read from an aspx page, the object should have access to that word doc no matter where it is being used from. So how can i include a word doc as a resource in my project? Also in order to open my word doc, i need a path for that doc. After i have included the file as a resource, how do i somehow get its "path" so the word doc object can be created? Is there someway to copy it over to a temp location on whatever machine is calling that object?
Thanks
Note: Do not use MSWord in server environment.
How to embed file as resource
After getting the stream use Stream.CopyTo to save it to FileStream created on temporary file Path.GetTempFileName or in temp folder .

How will i extract the data from the docx file using DocumentFormat.OpenXml -details below

I have a Docx file created by adding the xml schema. I am giving you the link of the docx file .
now I want to extract-- To,From,heading,body.
Currently I am using the Library DocumentFormat.OpemXml.
But didnt succeed.
Can any one suggest me the steps .
you have to explore the DocumentFormat.OpenXml for the extraction of your data from the docx file.
Or Another method is that,
First convert the extension of the docx file to .zip .
After opening the zip file open the file document.xml.
In this file you will find all your data .
Now you just need to do is to read the xml file in c# and extract the data.
You use xmlDocument class and extract The data
I think it will be useful.

Word Template, populate fields, C# .NET

I have a word document (docx format) that I need to populate with data from a dictionary . The key of the dict will be the word field to be populated. I was wondering what is the best way to do this (bookmarks, mail merge fields, etc)? I did a little work with Bookmarks, but I would like to be able to reuse some of the fields (like "first_name" field etc), bookmarks are unique (or I seem to think so).
A few lines of code on how to do this would be really helpful. (c# .net 4.0)
You can make xpath data bindings in a .docx document. A .docx file is just a zipped file in the Office Open XML which can be opened in .Net without third-party libraries.
http://msdn.microsoft.com/en-us/library/system.io.packaging.package.open.aspx
And you can then supply the .docx file with a XML file in your own format. Then the only thing you have to do is to edit the embedded XML file and not the document itself. See:
http://blogs.msdn.com/b/mikeormond/archive/2008/06/20/word-2007-content-controls-databinding-and-schema-validation.aspx
http://blogs.msdn.com/b/acoat/archive/2007/03/01/linking-word-2007-content-controls-to-custom-xml.aspx
And a custom tool to do the bindings:
http://dbe.codeplex.com/
You could properly also use some .Net interaction with Word itself but I can't help you with that.

Categories

Resources