Is it possible, using iTextSharp, get all text occurrences contained in a specified area of a pdf document?
Thanks.
First you need the actual coordinates of the rectangle you marked in Red. On sight, I'd say the x value 144 (2 inches) is probably about right, but it would surprise me if the y value is 76, so you'll have to double check.
Once you have the exact coordinates of the rectangle, you can use iText's text extraction functionality using a LocationTextExtractionStrategy as is done in the ExtractPageContentArea example.
For the iTextSharp version of this example, see the C# port of the examples of chapter 15.
System.util.RectangleJ rect = new System.util.RectangleJ(70, 80, 420, 500);
RenderFilter[] filter = {new RegionTextRenderFilter(rect)};
ITextExtractionStrategy strategy = new FilteredTextRenderListener(
new LocationTextExtractionStrategy(), filter);
text = PdfTextExtractor.GetTextFromPage(reader, 1, strategy);
#BrunoLowagie gives an excellent answer but something I really struggled with was getting the actual coordinates to use. I started out with using Cursor Coordinates from Adobe Acrobat Pro.
From here I could get the coordinate in inches and calculate the DTP point (PostScript points) by multiplying the value with 72.
However something was still not right. Looking at the Y value this seemed way off. I then noticed that Adobe Acrobat counts coordinates in this view from the top left instead of bottom left. This means that Y needs to be calculated.
I solved this in code like this:
var rect = new RectangleJ(GetPostScriptPoints(4.19f),
GetPostScriptPoints(GetInverseCoordinateInInches(pdfReader, 1, 1.42f)),
GetPostScriptPoints(3.5f), GetPostScriptPoints(0.39f));
RenderFilter[] filter = { new RegionTextRenderFilter(rect) };
ITextExtractionStrategy strategy = new FilteredTextRenderListener(
new LocationTextExtractionStrategy(), filter);
var output = PdfTextExtractor.GetTextFromPage(pdfReader, 1, strategy);
private float GetPostScriptPoints(float inch)
{
return inch * 72;
}
private float GetInverseCoordinateInInches(PdfReader pdfReader, int pageIndex, float coordinateInInches)
{
Rectangle mediabox = pdfReader.GetPageSize(pageIndex);
return mediabox.Height / 72 - coordinateInInches;
}
This worked but I think it looks a little messy. I then used the tool Prepare Form in Adobe Acrobat Pro and here the Y coordinate showed up correctly when looking at Text Field Properties. It could also convert the box into points right away.
This means I could write code like this instead:
var rect = new RectangleJ(301.68f, 738f, 252f, 28.08f);
RenderFilter[] filter = { new RegionTextRenderFilter(rect) };
ITextExtractionStrategy strategy = new FilteredTextRenderListener(
new LocationTextExtractionStrategy(), filter);
var output = PdfTextExtractor.GetTextFromPage(pdfReader, 1, strategy);
This was a lot cleaner and faster so this was the way I choose to do it in the end.
See this answer if you would like to get a value from a specific location for every page in the document:
https://stackoverflow.com/a/20959388/3850405
Related
XSize Si1 = gfx.MeasureString(student_Name.ToString(), fontbody2);
var height = Convert.ToInt32(Si1.Width);
var j3 = height / 150;
var j2 = (j3 * 20);
XRect rect1 = new XRect(135, x, 150, 150);
tf.DrawString(student_Name.ToString(), fontbody2, XBrushes.Black, rect1, XStringFormat.TopLeft);
The output come to new line but just based on spaces.
It doesn't consider ',' for line break and goes beyond the scope.
The XTextFormatter class that comes with PDFsharp as a demo breaks lines at whitespace. In normal texts, a comma will be followed by a space and the line will break at the space.
If you have special needs, take the source code of the XTextFormatter class and adjust it to your needs.
Your code snippet is somewhat incomplete. From the variable name I assume you are using XTextFormatter.
I need to create a PDF file with 100mm X 150mm (Width X Height). To create this I tried to apply following:
var doc = new iTextSharp.text.Document(new Rectangle(100f, 150f));
and
float height = 0;
float width = 0;
float.TryParse("100", out width);
float.TryParse("150", out height);
var doc = new iTextSharp.text.Document(new Rectangle(width, height));
But above are generating smaller/ larger size PDF. Please share suggestions that how I can convert the mm to float and make it work.
The measurement used in PDF is called the user unit. By default 1 user unit equals 1 point. There are 72 points in one inch. This explains why you document is smaller than expected if you pass a value that is expressed in millimeters rather than user units.
If you want to use millimeters and you don't want to do the Math, you can use the static millimetersToPoints() method in the Utilities class.
I am currently doing a project in which I've managed to identify the peak I want. However, I wanted to do more like circling the particular point with a label attached to it. Is it possible to do that in Zedgraph?
I've attached a snippet of my code which only include a text label to that point, and I wanted to do more so people will identify the point more easily.
PointPair pt = myCurve.Points[i-1];
const double offset = 0.8;
TextObj text = new TextObj("P", pt.X, pt.Y + offset,
CoordType.AxisXYScale, AlignH.Left, AlignV.Center);
text.ZOrder = ZOrder.A_InFront;
text.FontSpec.Border.IsVisible = false;
text.FontSpec.Fill.IsVisible = false;
text.FontSpec.Fill = new Fill( Color.FromArgb( 100, Color.White ) );
myPane.GraphObjList.Add(text);
Any help is appreciated! Thanks!
Make a LineItem as follows
LineItem line = new LineItem("Point", new double[] {pt.x}, new double[] {pt.y}, Color.Black, SymbolType.Circle);
line.Symbol.Size = 20;
line.Symbol.Fill = new Fill(Color.Transparent);
myPane.CurveList.Add(line);
This should create a large empty circle centered around your point. Obviously, you can adjust color and size as you see fit, and the ZOrder if you need to. You might want to adjust your legend so it doesn't include this point. Alternatively, you can name this line with your label and leave it in the legend as a way of tagging it. The only other way for a label is to do what you're doing, as I'm not sure of a way to associate labels directly to a line.
We're using ITextSharp for reasons I do not understand and we dont have the book yet and I have an annoying problem that I'd appreciate help with.
I'm working with a footer and I can not get it to align as I want it.
The function I have takes a list of strings, but it's generally 4 strings I want on a row each. It does not seam like itextsharp can handle strings with linebreaks so that's the reason for the list.
Now this does not position correctly for me, the first string looks ok, but then the second string is a bit longer and it's half outside the document as is the third string and the 4th is not even visible even thou there is 1 cm of space left.
Thanks for help!
public PdfTemplate AddPageText(IList<string> stringList, PdfWriter writer)
{
var cb = writer.DirectContent;
PdfTemplate footerTemplate = cb.CreateTemplate(450, 120); //55);
footerTemplate.BeginText();
BaseFont bf2 = BaseFont.CreateFont(BaseFont.TIMES_ITALIC, BaseFont.WINANSI, BaseFont.NOT_EMBEDDED);
footerTemplate.SetFontAndSize(bf2, 9);
footerTemplate.SetColorStroke(BaseColor.DARK_GRAY);
footerTemplate.SetColorFill(BaseColor.GRAY);
footerTemplate.SetLineWidth(3);
footerTemplate.LineTo(50, footerTemplate.YTLM);
int y = 10;
foreach (string text in stringList)
{
float widthoftext = 500.0f - bf2.GetWidthPoint(text, 9);
footerTemplate.ShowTextAligned(PdfContentByte.ALIGN_RIGHT, text, widthoftext, 50 - y, 0);
y += y;
}
footerTemplate.EndText();
return footerTemplate;
}
If you are doing string placement and using DirectContent then you are responsible for the content. In this case you will need to calculate the string rectangles and wrap accordingly.
I would suggest, however, moving to using a table with cells for the text. Tables wrap text and handle some of the issues you are dealing with.
I am writing a function that applies special formatting to predetermined keywords when printing a string. For example, in the string - "Why won't this work?" I might need to print the word "Why" underlined and in blue.
I've had to implement this in pieces, printing each segment of a string with a separate call to print. This approach works with one problem - I cannot get the spacing correct when printing the strings. My keywords print over top of previous default text and are overlapped in turn by text printed afterward.
I am using bounding rectangles to place my strings on the printed page.
RectangleF rectKeywordBounds = new RectangleF( 60.0, 60.0, 550.0, 1200.0);
Once I've printed a segment of the string, I modify the size of the rectangle by the number of characters drawn and I print the next segment of the string.
EArgs.Graphics.DrawString(strFragment, fontBlueItalics, Brushes.Blue, rectKeywordBounds );
iLastPrintIndex = strFragment.Length + iLastPrintIndex;
I've used this method to change the print position of the new string segment:
rectKeywordBounds = new Rectangle(rectKeywordBounds .X + iLastPrintIndex, rectKeywordBounds .Y, rectKeywordBounds .Width, rectKeywordBounds .Height);
And I've used this one:
properSpacing = new SizeF(-((float)iLastPrintIndex), 0.0f);
rectKeywordBounds .Inflate(properSpacing);
Both methods result in the same overlapping of segments. The following code advances a bounding rectangle in the fashion I expect, so why doesn't the concept work when printing text within the rectangle?
Rectangle rectKeywordBounds = new Rectangle(90, 90, 800, 100);
for (int x = 0; x < 6; x++)
{
EventArgs.Graphics.DrawRectangle(Pens.BlueViolet, rectKeywordBounds );
rectKeywordBounds = new Rectangle(rectKeywordBounds .X + 15, rectKeywordBounds .Y + 200, rectKeywordBounds .Width, rectKeywordBounds .Height);
}
You don't say how you are calculating how far to advance the rectangle -- you seem to be using the length of the string, which won't work as this is a character count rather than a coordinate value.
Instead, you should use the Graphics.MeasureString method to calculate how much space is required to print the current segment, and advance the rectangle that much. You may also be able to do this all in one go using Graphics.MeasureCharacterRanges if you are using the same font for all segments.
By the way, if you have the option to use WPF instead of GDI+, then the TextBlock class will take care of all measurement and layout for you.