iTextSharp vertical page fill

iTextSharp vertical page fill - c#

I'm trying to find something like \vfill from Latex in iTextSharp (a way to insert spaces and place text at the bottom of the page). It's only for one page, not a footer.
I searched online and in the book iText in Action, but didn't find any answers.

OK, after a long time and trying many things, I have found a solution.
Some things I tried that "worked", but not good enough:
First I calculated the height of my paragraph (by writing it in a new table in a new document in the RAM), then I would add newlines until there was just enough room for my text. Result: NOT a good way, the text would by off by a few points (the y position in the document, because of the newlines).
Then I tried to do this with ColumnText: too many calculations (since my document is dynamic) and I didn't like positioning it absolute.
So my solution is to use a PdfPTable:
var t = new PdfPTable(1);
t.ExtendLastRow = true;
t.WidthPercentage = 100;
var c = new PdfPCell();
c.VerticalAlignment = Element.ALIGN_BOTTOM;
c.DisableBorderSide(Rectangle.BOX);
var p = new Paragraph("This is a test. This is a test. This is a test. This is a test. This is a test. This is a test. This is a test. This is a test. This is a test.");
p.Alignment = Element.ALIGN_JUSTIFIED;
c.AddElement(p);
t.AddCell(c);
doc.Add(t);
Pretty simple, but I lost a lot of time on this. I hope this helps others too.

Related

PDF Table Structure

I have a PDF file with tabular structure but I am not able to store it in database as the PDF file is in Mangal font.
So two problems occur to me:
Extract table data from PDF
Text is in Marathi language
I have managed to do this for English with the following code:
ITextExtractionStrategy strategy = new LocationTextExtractionStrategy();
string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, i+1, strategy);
text.Append(currentText);
string rawPdfContent = Encoding.UTF8.GetString(Encoding.Convert(Encoding.UTF8, Encoding.UTF8, pdfReader.GetPageContent(i + 1)));
This encoding gives tabular structure but only for English font, want to know for Marathi.

Funnily enough, requirement no. 1 is actually the hardest.
In order to understand why, you need to understand PDF a bit.
PDF is not a WYSIWYG format. If you open a PDF file in notepad (or notepad++), you'll see that it doesn't seem to contain any human-readable information.
In fact, PDF contains instructions that tell a viewer program (like Adobe) how to render the PDF.
So instead of having an actual table in there (like you might expect in an HTML document), it will contain stuff like:
draw a line from .. to ..
go to position ..
draw the characters '123'
set the font to Helvetica bold
go to position ..
draw a line from .. to ..
draw the characters '456'
etc
See also How does TextRenderInfo work in iTextSharp?
In order to extract the table from the PDF, you need to do several things.
implement IEventListener (this is a class that you can attach to a Parser instance, a Parser will go over the entire page, and notify all listeners of things like TextRenderInfo, ImageRenderInfo and PathRenderInfo events)
watch out for PathRenderInfo events
build a datastructure that tracks which paths are being drawn
as soon as you detect a cluster of lines that is at roughly 90° angles, you can assume a table is being drawn
determine the biggest bounding box that fits the cluster of lines (this is know as the convex hull problem, and the algorithm to solve it is called the gift wrapping algorithm)
now you have a rectangle that tells you where (on the page) the table is located.
you can now recursively apply the same logic within the table to determine rows and columns
you can also keep track of TextRenderInfo events, and sort them into bins depending on the rectangles that fit each individual cell of the table
This is a lot of work. None of this is trivial. In fact this is the kind of stuff people write phd theses about.
iText has a good implementation of most of these algorithms in the form of the pdf2Data tool.

Code:
ITextExtractionStrategy strategy = new LocationTextExtractionStrategy();
string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, i+1, strategy);
string rawPdfContent = Encoding.UTF8.GetString(Encoding.Convert(Encoding.UTF8, Encoding.UTF8, pdfReader.GetPageContent(i + 1)));
Then I have identified lines (Horizontal and Vertical) from PDF. As for lines PDF has either re or m and l Keywords.
Then I worked for marathi text which I got from iTextSharp.
Then I merged both for desired location I extract the text using code-
Int64 width = Convert.ToInt64(linesVertical[5].StartPoint.X) - Convert.ToInt64(linesVertical[2].StartPoint.X);
Int64 height = Convert.ToInt64(linesVertical[2].EndPoint.Y) - (Convert.ToInt64(linesVertical[2].StartPoint.Y));
System.util.RectangleJ rect = new System.util.RectangleJ(Convert.ToInt64(linesVertical[2].StartPoint.X), (800 - Convert.ToInt64(linesVertical[2].EndPoint.Y) + 150), width, height);
RenderFilter[] renderFilter = new RenderFilter[1];
renderFilter[0] = new RegionTextRenderFilter(rect);
ITextExtractionStrategy textExtractionStrategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), renderFilter);
Owner_Name = PdfTextExtractor.GetTextFromPage(reader, 1, textExtractionStrategy);

Inconsistent appearance between manual and coded versions of solid databar and databar minimum value

I am trying to create solid databars in EPPlus 4.0.4, and am running into two problems.
First, I haven't been able to figure out how to create a solid fill color.
Second, at least for small values, the bars aren't showing up the way I expect them to.
The screenshot below illustrates both issues. In both cases, the desired outcome is that of the databar I've added manually in Excel:
This is the code I'm currently using:
var bars = doc.ConditionalFormatting.AddDatabar(range, Color.FromArgb(99,195,132));
bars.HighValue.Type = eExcelConditionalFormattingValueObjectType.Num;
bars.LowValue.Type = eExcelConditionalFormattingValueObjectType.Num;
bars.HighValue.Value = numResponses; //82
bars.LowValue.Value = 0;
For the solid color, I've been trying out variations of values for the different properties of bars.Style.Fill, to no avail. If this is implemented, it is a simple matter of me not finding the right property.
I'm having a harder time understanding the second issue. If I go into "Manage rule" in Excel, the high and low values are properly set, and I have found no value I can change them to that will make their appearance match that of the manually created bars.

This is an extension list problem. This comes up alot when getting into more complex exports. Conditional formatting is probably one of the tougher ones because there are so many nuances and it has changed so much over the years.
Extension list (extLst tags in xml) is kind of a catchall bucket that the OpenOfficeXml standard can use to added new features and formatting. In your case Excel populates the extension list section to allow for the extended min/max limit. Epplus does not support this which is why you see the difference.
Your simplest option would be just to inject it yourself via xml/string manipulation Not pretty but it gets the job done:
var bars = doc.ConditionalFormatting.AddDatabar(range, Color.FromArgb(99, 195, 132));
bars.HighValue.Type = eExcelConditionalFormattingValueObjectType.Num;
bars.LowValue.Type = eExcelConditionalFormattingValueObjectType.Num;
bars.HighValue.Value = numResponses; //82
bars.LowValue.Value = 0;
//Get reference to the worksheet xml for proper namespace
var xdoc = doc.WorksheetXml;
var nsm = new XmlNamespaceManager(xdoc.NameTable);
nsm.AddNamespace("default", xdoc.DocumentElement.NamespaceURI);
//Create the conditional format extension list entry
var extLstCf = xdoc.CreateNode(XmlNodeType.Element, "extLst", xdoc.DocumentElement.NamespaceURI);
extLstCf.InnerXml = #"<ext uri=""{B025F937-C7B1-47D3-B67F-A62EFF666E3E}"" xmlns:x14=""http://schemas.microsoft.com/office/spreadsheetml/2009/9/main""><x14:id>{3F3F0E19-800E-4C9F-9CAF-1E3CE014ED86}</x14:id></ext>";
var cfNode = xdoc.SelectSingleNode("/default:worksheet/default:conditionalFormatting/default:cfRule", nsm);
cfNode.AppendChild(extLstCf);
//Create the extension list content for the worksheet
var extLstWs = xdoc.CreateNode(XmlNodeType.Element, "extLst", xdoc.DocumentElement.NamespaceURI);
extLstWs.InnerXml = #"<ext uri=""{78C0D931-6437-407d-A8EE-F0AAD7539E65}"" xmlns:x14=""http://schemas.microsoft.com/office/spreadsheetml/2009/9/main""><x14:conditionalFormattings><x14:conditionalFormatting xmlns:xm=""http://schemas.microsoft.com/office/excel/2006/main""><x14:cfRule type=""dataBar"" id=""{3F3F0E19-800E-4C9F-9CAF-1E3CE014ED86}""><x14:dataBar minLength=""0"" maxLength=""100"" gradient=""0""><x14:cfvo type=""num""><xm:f>0</xm:f></x14:cfvo><x14:cfvo type=""num""><xm:f>82</xm:f></x14:cfvo><x14:negativeFillColor rgb=""FFFF0000""/><x14:axisColor rgb=""FF000000""/></x14:dataBar></x14:cfRule><xm:sqref>B2:B11</xm:sqref></x14:conditionalFormatting></x14:conditionalFormattings></ext>";
var wsNode = xdoc.SelectSingleNode("/default:worksheet", nsm);
wsNode.AppendChild(extLstWs);
pck.Save();
Note the gradient=""0"" which will set the color bars to solid instead of a gradient as well as the min/max settings to get the spread you are looking for.
A more "proper" way would be to would to recreate the xml objects node by node and attribute by attribute which will take a while but only have to do it once.

Measure String in MigraDoc TextFrame

I've already tried asking the question on their forums but as yet to have received a response, hope someone can help me on here.
I have setup a custom report screen in asp.net where people can drag labels and fields and Migradoc produces this accordingly by using textframes and top/left/width/height properties to lay them out in the same place they were dragged/resized to. This all works great however one issue I have is if the text is longer than the textframe it runs off the page and I need the page to move on accordingly whilst retaining the other objects in place.
I can use the below code to measure a string:
Style style = document.Styles["Normal"];
TextMeasurement tm = new TextMeasurement(style.Font.Clone());
float fh = tm.MeasureString(value, UnitType.Millimeter).Height;
float fw = tm.MeasureString(value, UnitType.Millimeter).Width;
But it's only useful for comparing the width against the frame and not the height because it could be different once put into a smaller area. Does anyone know how I can measure this string based on bound width/height values i.e. within a text frame.

Look at the CreateBlocks() method in the XTextFormatter class and how it calls MeasureString in a loop to break the text to multiple lines.
I'm afraid you have to implement such a loop yourself.
Or maybe use the PrepareDocument() method of the DocumentRenderer class to let MigraDoc do the work and just query the dimensions when it's done.
BTW: a similar question had been asked at the forum before:
http://forum.pdfsharp.net/viewtopic.php?p=3590#p3590
Answer includes some source code.

An easy way to do this (using I-liked-the-old-stack-overflow's link) is to add the PdfWordWrapper class to your project and then calculate the dimensions of your text as follows:
var wrapper = new PdfWordWrapper(g, contentWidth); //g is your XGraphics object
wrapper.Add("My text here", someFont, XBrushes.Black);
wrapper.Process();
var dimensions = wrapper.Size; //you can access .Height or .Width
//If you want to reuse the wrapper just call .Clear() and then .Add() again with some new text

OpenXML: Anyway to see if a Word Document fits one page

While I doubt it, if I open up a word document using OpenXML sdk in C# and add some info, is there any way for me to see if it still fits one page?
If it doesn't I wan't to reduce font size on specific items I added until it fits.
I could write this algorithm if I had the current size in relation to page size with margins and all that.

I ran across this example on another site, don't know if it'll work in your case, as it requires the Office PIA...
var app = new Word.Application();
var doc = app.Documents.Open("path/to/file");
doc.Repaginate()
var pageNumber = doc.BuiltInDocumentProperties("Number of Pages").Value as int;

Stacked Chart using MSChart

I am attempting to create a stacked chart using the relatively new Microsoft Chart Controls. I am sure that I am missing something obvious but a bit of help will go a long way. The below code creates a chart with two columns. I'd like the columns to be stacked on top of each other. Further, I'd like the total of the two to be displayed on the chart. Any help would be much appreciated.
Series activeSeries = new Series("Active");
activeSeries.ChartType = SeriesChartType.StackedColumn;
activeSeries.BorderWidth = 3;
activeSeries.ShadowOffset = 2;
activeSeries.Points.AddY(3000);
LaptopChart.Series.Add(activeSeries);
Series inactiveSeries = new Series("Inactive");
inactiveSeries.ChartType = SeriesChartType.StackedColumn;
inactiveSeries.BorderWidth = 3;
inactiveSeries.ShadowOffset = 2;
activeSeries.Points.AddY(987);
LaptopChart.Series.Add(inactiveSeries);

Bone head move when creating the second series I added the inactive points to the active series. Sometimes no matter how often you walk through your own code it takes a second set of eyes to find things. Sorry for wasting anyone's time looking at this. The second reference to activeSeries.Points.AddY(987); should be inactiveSeries.Points.AddY(987);

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.