Can I generate .MD files from XML Documentation?

Can I generate .MD files from XML Documentation? - c#

I have used Sandcastle to document my class libraries recently. However, I have recently discovered that Sandcastle is no longer developed by Microsoft: https://archive.codeplex.com/?p=sandcastle.
I have had a look online on GitHub and I see that lots of open source projects use .MD (MarkDown) files these days. I see that .MD files are supported by GitHub and TFS. How can I generate .MD files from XML Documentation files?
I have spent the last few hours Googling this. For example, I have found this with only a few hundred downloads: https://github.com/lijunle/Vsxmd/tree/master/Vsxmd I am not asking for tool recommendations. I am asking if it is possible to convert .XML Documentation to .MD files.

Yes, it is possible.
There are a number of projects to do so out there. The one you found Vsxmd for example but also simpler ones like lontivero/593fc51f1208555112e0.
The process is basically reading and converting each XML element (or some of them) to their equivalent in markdown. The equivalent is something you must choose when writing the converter, it depends on the style you want for your resulting file.
Things get complex when dealing with tags like <inheritdoc/> if you don't want to just print "Inherit from parent." like Vsxmd does or ignoring it like lontivero does.
I'm still searching for a better tool or time for improving one.

In my scenario, I wanted code documentation for my project in the GitHub Wiki, and it takes markdown files not html.
Visual Studio (Mac, in my case) has a project options > build > compiler checkbox to generate the xml documentation file for the project. Vsxmd generates one md file from this one xml file. It is very well and extensively laid out when looking at it in most markdown previewers, but GitHub seems to have low timeouts and cannot render a large markdown page.
It is not laid out as well, but something I found that GitHub can render is the single md file generated by the web tool, vsdoc-2-md.
What I ended up using for my project is Default Documentation. It is a nuget that runs during buildtime just like Vsxmd, but generates multiple md files that are small enough for GitHub to not complain.
Just note that GitHub Wiki renames files in its urls so that all .md extensions get stripped. I had to do a search/replace to strip .md from all the links in order to get clicking around documentation from the GitHub Wiki to work.

Yes, it is possible.
The available solutions did not suit my needs, so (disclaimer) I've started developing my own: MarkDoc.Core.
This is a standalone application, so no NuGet packages are involved. Moreover, the application is fully customizable allowing you to export to other sources than Markdown and more.
Again, this is my project and at the time of writing this answer, it is still in progress.

Of course it is possible to generate markdown from XML comments. The obvious question is: Why? The best answer is: to generate a website from the markdown content.
DocFX is essentially the replacement to Sandcastle. I'm using it to generate a documentation website from XML comments in C# code.

Related

.NET Core PDF and RTF document generation

Looking for suggestions for libraries that can generate PDF and RTF documents from stored data (not "HTML to PDF" or "URL to PDF"). With all functionality for adding images, encryption etc. We are currently looking for an alternative to PDFSharp-MigraDoc-GDI, which, although works with .NET Core, does not fully support it and we see compiler warnings - "This package may not be compatible with your project". We have also been getting issues on the IIS tier regarding GDI+. We've decided to play it safe and find an alternative. Does anyone have a solution that they would recommend? Thanks

As far as I know, you can write whole new documents using the Microsoft.Office.Interop library, here is this post that's talk about it (be careful about deploying things like these, you might need an office instalation running on the server):
https://www.c-sharpcorner.com/UploadFile/muralidharan.d/how-to-create-word-document-using-C-Sharp/
And I've found this post about using the library to print PDFs:
How do I convert Word files to PDF programmatically?
It's not much but hope that it helps, regards!

Create single HTML / markdown file using Sandcastle Documentation

simple question here.
I'm using Sandcastle Help File Builder to generate documentation of a C# project in Markdown or HTML format. My issue is it's creating a separate .md / .html file for every item that's documented. Is there any way to force it to generate a single large file containing all the documentation similar to how Visual Studio outputs one single XML file, or at minimum using a directory system to at least structure the output a little better rather than have one folder with ~200 markdown / html files.
Thanks.

For the HTML part of the question, it is possible to open the generated index.html file in a browser (may need to enable scripting for the page to work correctly), then save it as a single file in MHTML format. According to the linked page, the current versions of most major browsers support MHTML nowadays.

Server Side HTML to PDF

I'm trying to find a C# library that will allow me to "Print" one of my HTML pages to a PDF file. I can't seem to find out if one currently exists that will allow you to do this. I've found several that will let you build a page, but haven't noticed if one would generate the pdf only based off of HTML.
EDIT: I'm not allowed a budget on this at work so it will need to be an open source/free product. If not I'm aware of iTextSharp and will have to generate the pdf programmatically (which is what I'm hoping to avoid :) )

I've had a lot of luck with ActivePDF WebGrabber. It's kind of odd to use compared to standard managed libraries (ActivePDF is unmanaged), but it gets the job done.

iTextSharp comes with a little companion : XML Worker
For a demo, have a look here
Even though the documentation refers to the Java API, the adaptation to C# should be straightforward.

I've experimented with itextsharp and it works for basic conversion, but gets complicated when you get into styles and formatting. I've also heard wkhtmltopdf is out there as another option.

Batch conversion of docx to clean HTML

I'm starting to wonder if this is even possible. I've searched for solutions on Google and come up with nothing that works exactly how I'd like it to.
I think it'd benefit to explain what that entails. I work for database group at my university's IT department. My main job is to take specs of a report in a docx file, copy that over to dreamweaver, fix some formatting, and put it onto their website. My issue is that it's ridiculously tedious to do this over and over. I figured, hey, I haven't written anything in C# for some time now, perhaps I could write an application to grab a docx file, convert it to HTML, fix the CSS, stick the header, and footer from the webpage on there, and save the result. I originally planned to have it do one by one, but it probably wouldn't be difficult to have it input a list of files and batch convert.
I've found these relevant topics on how to accomplish this, but they don't fit my needs well enough.
http://www.techrepublic.com/blog/howdoi/how-do-i-modify-word-documents-using-c/190
This is probably fine for a few documents, but since it's just automating an instance of Word, I feel like it'd be slow and memory intensive. I'd prefer to avoid opening and closing an instance of Word 50+ times.
http://openxmldeveloper.org/articles/333.aspx
This is what I started using. XSLT had the benefit of not needing word to be installed nor ran for each file. After some searching I got a proof of concept working. It takes in a docx file, decompresses it, grabs the document.xml from that, and uses the DocX2Html.xsl file I scavenged from OpenXML viewer. I believe that was originally provided by MS for sharepoint servers to provide the ability to render word documents in a browser. Or something along those lines.
After adjusting that code to fit my needs, and having issues with the objXSLT.Load () method, I ended up using IlMerge to make the XSL into a DLL. No idea why I kept getting a compile error when using the plain old XSL file, but the DLL worked fine, so I was satisfied. Here (http://pastebin.com/a5HBAakJ) is my current code. It does the job of converting docx to HTML just fine (other than random spaces between some words), but the result file has ridiculously ugly HTML syntax. An example of this monstrosity can be found here (http://pastebin.com/b8sPGmFE).
Does anyone know how I could remedy this? I'm thinking perhaps I need to make a new XSL file, as the one MS provided is what's responsible for sticking all those tags and extra code in there. My issue with that is that I don't know anything about how to do that. Perhaps there's an alternative version already out there. All I'd need is one that will preserve tables and text formatting. Images aren't needed.

This looks like just what you need: http://msdn.microsoft.com/en-us/library/ff628051(v=office.14).aspx
The author Eric White blogged about his experiences developing that tool. You can see that list of posts on his blog here: http://blogs.msdn.com/b/ericwhite/archive/2008/10/20/eric-white-s-blog-s-table-of-contents.aspx#Open_XML_to_XHtml

Since I'm a big fan of Aspose.Words, a commercial library to create/process Word documents, I would do something like:
Open the Word document with Aspose.Words.
Save the Word document as HTML.
Use something like SgmlReader or HTML Agility Pack (or even Regular Expressions if it is suitable) to remove unwanted HTML tags/attributes.
Since you wrote you work at an university, I'm not sure whether commercial packages are an option, though.

Hi not sure what the rules are on promoting your own solutions, so do let me know if I am out of line.
I am a web developer who had the same issues, so I created my own tool:
http://www.convertwordtohtml.com
We are also working on a new version that will have even better conversion quality and one click conversion eg you can right click on a word file and it will be directly converted to html and the code placed into the clipboard. The current version also supports command line access and the new version will have a server version to.
There is a free trial version downloadable from the site , and if you have any questions do contact me any time.

Merge documents

I'm trying to merge two docx-documents into one docx-document using OpenXML SDK 2.0. The documents should be merged without loosing their styling and custom headers and footers. I hope I can achieve this using AltChunk and a section break. But I can't get it working.
Is it possible what I'm trying to do? Can someone give me a hint how to achieve this?

The above answer is NOT correct at all! This is EXACTLY what AltChunk has been designed to do, and it works great!
NOTE: that the documents will not be merged into one document UNTIL Word opens the file for the first time (obviously the file has to be saved or the file on disk won't be updated.)
See this blog for more information on how to do it properly:
https://blogs.msdn.com/b/ericwhite/archive/2008/10/27/how-to-use-altchunk-for-document-assembly.aspx?Redirected=true
p.s. As for examining Open XML using the productivity tool, my opinion is to just install the official Visual Studio Open XML add-on and open the Office Documents from Visual Studio to examine them, it's super convenient! :-)

Using the 'Open XML Productivity Tool' I analyzed the structure of a docx-document, and concluded that merging documents with their style, headers, footers, ... is not possible out of the box using Altchunk. You can download the tool seperatly from the open xml sdk.
What I'm doing now, and what is working, is copying everything manually into to document, making sure that all style-references, header-references, footer-references, ... are preserved. This means that I give them a new unique id before I copy them into the document and changing all references from the old id to the new id. There is a lot of code to do this, but the tool mentioned above really helped.
Adding a section break is also quite difficult. You should know that the SectionProperties-tag describes all the properties of the section and that there can be one SectionProperties-tag under the Body-tag, describing the properties of the last section. So adding a new sectionbreak, means copying the last SectionProperties-tag to the last paragraph of the section and adding a new SectionProperties-tag under the Body-tag. I also got al lot of information from the productivity tool.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.