Get height of rendered text and images in MS Word - c#

I'm creating a newspaper authoring system. Today I'm using Aspose.Words library to generate newspaper using Docx format as output, based on a lot of other documents as input.
The basic idea is to load a lot of articles documents into a List, then generate a final docx with newspaper.
We need to get the total height of a text (with images and tables) inside columns. As libraries like Aspose.Words deal with Docx format like DOM, there isn't way to know how text will be arranged inside columns. Then I can't know the real height.
We've worked in our own way to get this height. I'm using MeasureString() function from System.Drawing.Graphics namespace. It returns width and height used by string and I can estimate how many lines (and points or inches) it will use inside a column.
But it is very poor and we need a more decent solution. We are thinking to use OpenXML SDK to get this Height, can we?
Aspose.Words doesn't support a way to know it and all Render classes are private to the library.
Can you think a new way to get this height?
Thank you,
Daniel Koch

This property isn't exposed in Open XML or the SDK (or VBA/VSTO for that matter). How exactly the height is calculated is not in any documentation. Possibly the way you are doing it is a way to proceed.
Another possible way is to put your TextColumns in a Table Column/Cell and grab that height (but if it is two text columns in the cell and the first one "fills" the cell top to bottom and the second one doesn't, you'll still have the issue of not being able to calculate the size of the second one).

I have almost the same problem that you have.. But in my case I'm dealing with Questions inside an Test Exam..
Well nowadays, we are using RTF to build the questions and a RichTextBox the measure the height.. Just like that (http://blogs.technet.com/david_bennett/archive/2005/04/06/403402.aspx)..
And I wanna to migrate to DOCX.. But still no luck on how to measure the question with tables and images.. :-(
Right now I'm studying the Document Members (http://msdn.microsoft.com/en-us/library/microsoft.office.interop.word._document_members.aspx), to try to do with Word Automation..
Regards,
Bruno

Thank all for answer.
I finished it changing Aspose.Words to PDFLib. Now I can control pages, columns or anything using Postscript Points.
We keep Aspose.Words only to content import, but it isn't indicate to print newsletter.

Related

How to add a footer as a watermark so that it can be removed later

I have some scanned PDF documents (pretty flat, no selectable text, tags, objects, etc) and I would like to add a footer that can also be removed after being added. However, if it overwrites on top of anything, I want to remove the footer only. We can assume that, after the watermark is added, it won't be rescanned, changed, or flattened. (I should mention, in case any iText employees see this question, that my organization has recently purchased a license but I just started this project and I am waiting to have it sent to me so I can register for official support.)
I found an excellent answer for adding and removing watermarks here: iText 7 - Add and Remove Watermark on a PDF . My problem, as stupid as it might sound, is I'm really struggling with getting the variables right, even after lots of trial and error. The scanned documents seem to be coming in as portrait (when viewed in a PDF viewer) but they have a rotation of 270 such that, PdfDocument.GetPage(i).GetPageSize() and GetPageSizeWithRotation() have the height and width reversed and I need to take this into account but also don't want to assume that this is always the case. The footer should be centered at the bottom of the page.
The method signature can be as in the link provided (https://stackoverflow.com/a/45225597):
public static void WatermarkPDF(string sourceFile, string destinationPath)
Thank you in advance for the help and support.
Okay, BIG TIME EDIT: requirements are changing. In fact, they want to be able to have 2 lines of text as a left aligned header and have the ability to remove or replace either or or both AND additionally, have a right aligned footer that also can be removed or replaced. Not sure anymore if this should be implemented as a watermark. Again, I can assume that, once I add the headers and/or footers, the document won't be reflattened or edited in any major way... so, if they are added as elements, they should be able to be removed as elements but the problem is the scanned documents have no structure to begin with anyway (at least they don't seem to so far). So, there's no parent element, tag, or whatever.

Infragistics XamDataChart: How to use NumericXAxis for CategorySeries like ColumnSeries, AreaSeries and so on?

Title of my question probably is descriptive enough, yet here is the rest:
I need to plot histograms and area charts for some numeric values. I already managed to do it partially using a CategoryXAxis, the only option which ColumnSeries Supports.
Unfortunately labels don't show good enough (Position, Spacing, ...) and also Tick Markers are meant to be for category data and show between Columns.
As the task looks really trivial, and i need to do this many times with different options, i was thinking maybe there is a way to do this using a NumericXAxis which is pretty simple and meant to be used for numeric data.
Unfortunately i don't see a way to convert use NumericXAxis for ColumnSeries? So I hope someone could help me on this.
Requirements: I use .Net Framework 4.0, C#, WPF
Note: I'm more interested in Code-Behind rather than XAML.

Update/Replace Shapes in visio diagram programmatically (C#)

I would like to programmatically(C#) update/replace all shapes of a given visio flowchart(*.vsd). The diagram layout remain the same(all connections, coordinates etc. are the same), but the master-shapes should be different(from different stencil).
Any examples, suggestions and ideas are highly appreciated.
Thanks for your suggestions! The source-diagram has many protected shapes that are grouped(+multiple subshapes), so I guess it will be better, if I take all the information for a given source-shape, and then drop a new master from the target-stencil and set these properties. Next, I would take the next Shape and do the same. I would create a new Visio document, since I’m not sure, whether the source Page-ShapeSheet is not customized someway. But I don't know how to do basic steps programmatically in C# e.g.
how to create new vsd file within C# (maybe
application.Documents.AddEx(“”))
must I then open this document with application.Document.OpenEx, or the document is already open/active
must I create a new Page within this document
-…
In this post: "save and close visio documents visual basic macro
" similar steps explained, but in VBA and not in C#.
I'd suggest just using Visio 2013, which has that function out of the box.
However, that's probably not going to work for you. I've taken two different routes in the past, depending on what differences there were between the original and the replacement shape.
One way to do this is, to copy as many attributes as you can between the shapes, and duplicate the glues and everything. So to do this, you just copy the width, height, pins, etc.., and then step through all the glues in the original shape, and move the glues to the new shape.
The other way, which is a bit cleaner, in my opinion, is wholesale copying all the geometry sections from the original into the destination shape. This makes it so you don't have to worry about glues and formatting and things, and are just copying over the graphics that make up the shape.
If you have a grouped shape with multiple subshapes, it's probably going to be easier to drop a new master out, but if it's just a simple graphic-type shape, copying the geometry is probably better.
One thing to be aware of with the "copy the geometries" method is, you have to make sure any user cells or controls which are precedent to any geometry cells in the new shape also exist in the original shape. Visio's Cell class tells you the precedents for a cell, so this is easy enough to do.
Hope that helps.
I think you can extract some information from these two links and play with it:
Visio shape - get X,Y position
http://msdn.microsoft.com/en-us/library/cc160747.aspx

How to create a Simple tag cloud? Using C# and Styling with css

I'm finding it impossible to create a tag cloud, I can't find any walkthrough’s or tutorials. (ones that work at least).
I'm just looking for a simple, basic example of a working tag cloud and I can spice it up after that.
The best link i found is:
http://www.geekzilla.co.uk/View960C74AE-D01B-428E-BCF3-E57B85D5A308.htm
But it's out dated and I can't download the source file, and there are many gaps in the code.
This isn't a really hard problem. Essentially a tag cloud is just a way of linking the fontsize to the how common the tag is.
First thing is how often does the tag appear:
select Value, Count(*)
from Tag
group by Value
order by Count(*)
Then when you render this resultset to the page, have some sort of algorithm to take the count for each tag and represent it as a font size. A naive approach would be to set the fontsize directly to the count, but that's likely to lead to unreadable results. Instead, perhaps just have the top 10% as a large font, the next 10% as the next fontsize down, etc. You'll have to work out an exact algorithm that works for you and your data, though.
Also, tag clouds are really a bit rubbish from several points of view (readability, searching, accessibility). Make sure the tag cloud isn't the only way to get access to the tags. Perhaps in alpha order or by Count(*) on a dedicated page.
Use the TermCloud from the Google Charts API. It's very easy to use and it renders beautifully.

calculate html page breaks (html 2 pdf) server side for precise print layout with headers and footers

We print pdf books generated through a html to pdf application.
There is a header and footer on each page, and we place content exactly using production, and translation restrictions (and layout variations) for different languages to ensure that the fixed content for each page fits.
So for example, although our content is dynamic, a paragraph is expected to take approximatley the same amount of space for the same place in the book. We sometimes change style and layout attributes for translations but the same rules about like sizes apply.
We have a header and footer on each page, and the entire book is rendered as one long html page using css line breaking to force each header onto to a new page. So to reflect we control fixed content height per page server side.
This works well, and we are very happy with the advantages that HTML affords us in presentation (designers rather than programmers can design pages etc), we are also heavily invested in this tech, we are in too deep to change direction now, so we are not able to change our technology, we are using html 2 pdf and we need to make this work as best as possible. That is not to say we could not mix tech. but...
The problem is thus, we now have some variable sized content, that we have no former control over, to us it is text, so we have control over its formatting, but not it's quantity. We also have headings which are different sizes.
We need a way to calculate page breaks, leaving as little white space as possible, and I would love to know how anyone else is dealing with this. I know this will not be an exact science, but I still need the best approach possible.
We have total control over the rendering/layout engine it is always ie8 compatible, so different browsers need not be considered.
These are my thoughts, would love to hear yours:
This is our current method, assign a number of lines per page (variable by font-size and font to allow for different locales) each block of content will be calculated into n lines cost and this figure used to calculate pages breaks.
Pro simple
Con inaccurate, none of our fonts are monospaced, needs configuring for every locale.
Render each consecutive page of free flow content into a webpage in a div of the exact page width (fixed div) let it flow to whatever vertical height it requires, using a html 2 bmp solution capture an image and use the height of the rendered image (edge detected and cropped if required) to calculate the required number of pages.
Pro Could be accurate, not too expensive if free flow content is kept contiguous.
Con Incomplete solution, once I know the required number of pages, how do I know where to break the html? Measuring each page using this method and edge detecting would be very expensive.
On a font by font basis, knowing in advance the font sizes, padding and margins of text and headings, calculate width and line breaks and height, chracter by character using width data extracted from the font file.
Pro Once all the data had been extracted, and margins had been added for differences in HTML rendering this could likely be fairly accurate.
Con Highly intricate and sensitive to style sheet changes.
Could we use a WebBrowserControl to somehow measure the content?
Love to hear your thoughts and suggestions.
EDIT....
Our pdf converter is Winnovative, which runs within a .net Windows service, our html feed however is generated in PHP.
Kindly refer the manual
http://www.winnovative-software.com/manual%5CHTML%20to%20PDF%20Converter%20for%20.NET%20-%20Developer%27s%20Manual.htm
point 5.1.
Hope this solution helps you.
Note: the internal links are not working, so kindly manually navigate to the desired point.
This question is old, but I am doing the same basic thing as you. What I've found is that line number counting is still important, but you can use the css style line height to standardize the size of each line. (height for tr's if html is table based). This should allow you to have a constant number of lines per page.
Did you come up with a solution that worked for you?

Categories

Resources