I have an word document template that only changes in several fields.
I remember looking around using google-fu and found that you can bind an xml to content controls and dynamically generate word documents through that method.
Unfortunately it's been a while since I revisited this particular problem, and all I remember about this problem was that it was unnessarily clunky and hard to manage.
Are there any opensource solutions that are more elegant? Or a better way to go at this?
I think it is fair to describe content control XML data binding as the latest in a series of techniques which enable Word document generation/automation.
Content control data binding was introduced in Word 2007, so is now not all that new. When you do document automation, you generally also want support for:
conditional inclusion/exclusion of content
repeating data (eg table rows, list items)
There are ways to do both of the above with content controls, and Microsoft has recently released dedicated support for repeating data.
Content control data binding is less brittle than older approaches, but if you don't like XML and XPath, you might not call it elegant :-)
The legacy approaches "baked-in" to Word include:
fields: MERGEFIELD, form fields, or custom fields
use of bookmarks
A problem with these is that it is easy for the author user to mistakenly delete them. Another problem is that they aren't as easy for a developer to work with at the XML level (though maybe OK via VBA or VSTO).
The other approach is to include magic text strings on the document surface, and there are various tools around which do this. This has an initial appeal, but the problem with this is that what looks like a contiguous text string on the document surface can be split up at the XML level, because of:
spelling/grammar correction
rsid (edits made at different times or by different people)
formatting (eg font size, appearance changes)
In short, yeah, there are alternatives. Choose your poison.
Related
I need to know how MS WORD page , is there a way to create a correct MS WORD (Correct tag in XML like LastRenderedPageBreak ...) pagination creating a file with OPENXML without opening it with MS WORD?
Pages are not represented in the OpenXML specification until a word processor renders the document. Once rendered, any page annotations reflect content positioning as it was represented by the last word processor. (Meaning the <lastRenderedPageBreak/> node should be viewed as potentially volatile.)
It is not impossible to calculate page positions on your own, but there is a substantial amount of work involved.
There are ways to work around this limitation, depending on why you need the page information. (E.g. form fields for page numbers, bookmark fields for TOCs, splitting/merging by section rather than page.)
For more information on verifying how this shakes out in the raw markup, see this answer.
If you are able to comment with some more information, we may be able to point you toward a possible workaround.
I want to develop a asp.net web application which should do the following task
a) user should be able to add content to the document. Content to be added can include text as well as image, screen shots etc.
b) user should be able to search based on some keywords. when searching with the keyword appropriate content along with images(if any) should be shown to user.
I am not sure what should be the proper approach for this. One way i think is to store text content in some xml file and later search for keywords by going though each node of xml and displaying. but i am not sure how to attach image content with xml. Also this method doesn't seem to be nice and efficient if with time document size increases a lot.
Anyone please suggest some proper way to do above requirement. Any hint would be appreciated.
Split it to two tasks. Editation and search.
Full text search is solved problem. Simply use Sphinx Search and you are done. Sphinx is simple to use and can do everything you will need. It has MySQL interface (your app connects to sphinx the same way as to second MySQL database).
Editation is a bit more complicated. If I understand correctly, you want multiple users to edit single document concurrently.
I recommend using websockets to notify other clients about changes in document. Long-polling and Server Sent Events have ugly side effects, like stopping browser from making another requests to server. To implement client side in Javascript, I would use React, Angular or similar framework to make updates as easy as possible.
Server side requires modification-friendly representation of a document, so if one user changes one part, and another user another part, your app should be able to merge changes. Changing completely different parts is easy, but it may be tricky to change the same paragraph or document node. Exact representation of each change depends on format of your document.
I do not see much benefits of using XML rather than any other format. It may be practical for document representation, but it will not help with merging of colliding modifications. I would start with plain array of strings, each representing a single paragraph. Extending it to full XML document is the easy part, once two users can edit the same paragraph.
To store images in XML, simply store files using their hash as a file name and then use such name to link the file in XML. Git does the same thing and it works nicely. You may want to count references to identify unused files.
I have asp.net app in which I need to edit a word document and than send that document in email as an attachment.
I would like to know what will be the best way to edit the word document and than use it.
The document already has data and there are few variables such as "company name", "date", "amount", etc that I am searching in the document and I am replacing them with values from within the code.
The code works great when I am running it locally but from some people I received answers that editing word document on the server shouldn't be the way I am doing now but I need to use either openxml to edit the document or google docs.
Any idea what's the best way to tackle this?
I would vote for OpenXML, but be prepared to spend a good day or two reading how to use the API for .NET and be patient. =)
I remember using this tool -
http://openxmldeveloper.org/resources/dotnet/m/cc/303.aspx - quite a bit to find the relevant parts in the document to modify. You basically load a Word document and can "drilldown" to find the parts you want to modify. You can actually write some pretty clean code to search the document for your textual markers and then replace them with data.
(I hope I understood the question correctly. You said you already had working code, so I wasn't sure what the question was.)
You can use the Open XML Format SDK as per http://msdn.microsoft.com/en-us/library/dd440953%28v=office.12%29.aspx
For what you're doing though, I think your approach is fine.
I have a fair amount of screwdrivers, but if I noticed a lose screw in the stool in front of me, I might just use the knife on the table because it will do the job perfectly adequately and save me a trip to the toolbox. It's not a tool designed for that job, but it is a tool that would do the job just as well and with less effort.
Now, if I decided to set about a day's worth of DIY with only the knife instead of the set of screw-drivers, that would be going to the other extreme. Here I'd have long-ago crossed the line where using the tools designed for a given job would have made my life much easier.
It's just the same with software tools.
One of the very points of XML formats is that we can do simple tasks with it treating it just as text. Yeah, we none of us want to be the guy with a 3-page-long regular expression with which they're trying to parse a complicated XML document, but when the problem naturally breaks down to a simple text substitution, do a simple text substitution.
I was searching the web with a few results, but none of them seems to fit the task. I was looking für possibilites for .NET, but would also like to know how Java/PHP/etc. developers finish tasks like this.
As far as I found out, I have the option to:
Use MigraDoc/PDFSharp and go the "code" way, without any visual designer
I could use HTML and convert it to a PDF (which is the best approach in theory, but practically it's awful to get good looking HTML 1:1 into a PDF file)
I could use some weird MS Word templateing/batch stuff
LaTeX?
What are your solutions?
We use SoftArtisans OfficeWriter
A solution that we settled on in a previous project was XSL-FO. Although it did not have a visual designer, we found it to be very developer friendly and more suitable to run in a server type environment. It also deals with document "flow" a lot better than most of the reporting software that offer a designer. I do know that we had a lot of trouble with Crystal Reports around deployment, COM exceptions being thrown and limitations on how many reports can be generated concurrently. One downside to using XSL-FO is all the syntactic sugar that comes with XML.
This question lists a few XSL-FO engines.
Regarding your "3.) weird MS Word templateing/batch stuff":
I love to use Aspose.Words, a commercial package to create/edit/export Microsoft Office Word documents, without any Office components being installed.
Aspose.Words is capable of doing Mail Merge stuff and write PDF files, so I often start on my desktop computer with a DOC that I edit in Word and use this with Aspose.Words on my server to produce PDFs.
One method I've used before for Windows desktop applications is to use XAML/WPF. The nice thing about this solution is that there are a lot of good tools and documentation around building layouts with XAML. Then you just pass the canvas to a PrintDialog and you're done. If you've been doing a lot with WPF/XAML already this is a very easy solution and I've had a lot of success with it. I learned most of what I needed to get started here: http://www.switchonthecode.com/tutorials/printing-in-wpf
The downside, of course, is your dependency then on .NET and WPF.
Similar to Matt Fs solution of using Crystal Reports, I use SQL Server Reporting Services. You can create add a rdlc file to your solution and use the WYSIWYG editor to design your report. Then in your code, all you have to do it assign your data source to your report in code and it should work. This even supports exporting to PDF.
Seems as no-one has mentioned Latex-based solutions, there was a stack overflow Tex question answered by jason. Short version: uses MikTex, beautiful documents, big hassle to use build/maintain.
Thanks for all your answers...
I finally decided to implement my own solution using Visual Studio 2010 and the Office-Tools... This is not the "perfect" solution, but it was easy & fast to implement, while i still have the flexibility to change the documents witch excel or word...
Downside of course: You need Office installed.
It depends on how you get your template documents. For example, if you have others in your organization responsible for generating the "standard" invoice document, you'll probably have a solution that involves mail merges in the Microsoft Word API, because you need to work with Word-formatted input files. Alternately, if you are merely given the specs for the appearance of the document ("Logo in the top-right, 5 inches down, then a horizontal line two inches below that, then... etc.") You could do it entirely in code. Even if you're designing a solution from scratch, take into account who your document suppliers WILL be, and plan accordingly. Finally, if this is from-scratch for a small set of documents that won't change much (i.e., you're starting your own software company and want to send invoices) don't do it. Just buy something off the shelf or use Word :)
We use xaml FixedPage, can use a designer like Kazaml, it has a lot of layout flexablity, and databinding works great with dynamic objects like expando. In code we bind a datacontext and then render that to XPS, since we need the final output to be pdf we use GhostXPS which is free but has to be executed in a separate processs, there are third party fully managed converters for xps to pdf though.
We use Crystal Reports which comes free with Visual Studio. You can easily create a report/document that is bound to a database or unbound.
For example you could suppress the header and footer, expand the details section to be approx. A4 size, then add either bound fields or use parameters for unbound content. Then at runtime for bound documents set the selection formula to only pull in data for one transaction or for unbound documents just pass in the parameters.
A nice feature of Crystal Reports is there are export features, so export to PDF, Word, etc. Also it's easy to auto print to a specified printer.
Crystal reports can be a pain! On a basic level the outsourced developers for our in house software for Works Order, Invoices etc we use Dev Express although I think it can be pricey.
For reports being generated by the software I ended up choosing to have exports into a raw CSV which of course can be opened up by any spreadsheet software
I have an existing XPS file that I would like to use as a template and possibly bind data to it. I have tried several methods, but cannot seem to get it to work.
Does anyone have any experience altering an existing XPS file to add data at runtime and then print or save?
Any help is appreciated.
XPS documents conform to the Open XML standard. There is an SDK for working with these docs. Here is a How-to article by Beth Massi: "Accessing Open XML Document Parts with the Open XML SDK".
Since you are working with the internal doc structure you might also check out 'Open XML Package Editor" which lets you explore the doc with Visual Studio. Here is another How-to by Beth Massi: "Handy Visual Studio Add-In to View Office 2007 Files".
+tom
it's a bit of a challenge to do this with XPS, but it is possible.
You can do this with our NiXPS SDK.
I've posted an example on my blog a while ago:
XPS variable data example
Regards,
Nick
Bindings are evaluated during the process of writing to an XPS document. So you can't set up a {Binding} in a FixedDocument, Write that FD to an XpsDocument, and expect to get that original FD back again when you next open that saved doc.
Also, the standard XpsWriter does convert everything into Glyphs on canvases, so you can't, say, a textbox in the original and expect to be able to find it after its been saved to a document.
I've never used the NiXPS libraries, so if Nick says it can be done you might want to check it out.
One last possibility--You can create placeholders in a form that you will be able to find later. They'd have to be text (something like [[{{FORMFIELDHERELOL}}]]) with some kind of delimiter scheme to differentiate the text from everything else. You could then go spelunking in the XML looking for text that fits the delimeter pattern and switch out those glyphs for your binding text. Of course, the issue with THAT is that if you aren't putting X chars in place of X chars you might find you have to do some repositioning. As its all glyphs on canvas this might be slightly harder than, say, threading a needle with a shoelace.