I'm trying to read text in a PDF doc using itextsharp library. I have a problem with a particular doc that only returns ? character. However with others doc I have not any problem.
¿What is the reason for that?
Here is my code
private void readPDF()
{
string pdfTemplate = #"c:\\test2.pdf";
// Título de formulario
this.Text += " - " + pdfTemplate;
String strText="";
try
{
PdfReader reader = new PdfReader(pdfTemplate);
for (int page = 1; page <= reader.NumberOfPages; page++)
{
ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
String s = PdfTextExtractor.GetTextFromPage(reader, page, its);
s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
strText = strText + s;
}
reader.Close();
textBox1.Text = strText;
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
Any ideas?? Thanks
Related
I am working on a WinForms application. I use the pdf file to reset the password and the values on pdf are stored as key-value pairs(email: xxxx#mail.com, pass: 11111).
What I want to do:
Read the PDF file line by line and fill the appropriate textboxes.
What I Have done:
public bool CreatePDF(string location, string email, string key)
{
if(location != "" && email != "" && key != "")
{
PdfWriter pdfwriter = new PdfWriter(location);
PdfDocument pdf = new PdfDocument(pdfwriter);
Document document = new Document(pdf);
Paragraph fields = new Paragraph("Email: "+email + "\n" + "Secret Key: "+key);
document.Add(fields);
document.Close();
return true;
}
else
{
return false;
}
}
public string ReadPDF(string location)
{
var pdfDocument = new PdfDocument(new PdfReader(location));
StringBuilder processed = new StringBuilder();
var strategy = new LocationTextExtractionStrategy();
string text = "";
for (int i = 1; i <= pdfDocument.GetNumberOfPages(); ++i)
{
var page = pdfDocument.GetPage(i);
text += PdfTextExtractor.GetTextFromPage(page, strategy);
processed.Append(text);
}
return text;
}
}
Thank you in advance Guys!. Any suggestions on CreatePDF are also welcome.
This is what I came up with,
var pdfDocument = new PdfDocument(new PdfReader("G:\\Encryption_File.pdf"));
StringBuilder processed = new StringBuilder();
var strategy = new LocationTextExtractionStrategy();
string text = "";
for (int i = 1; i <= pdfDocument.GetNumberOfPages(); ++i)
{
var page = pdfDocument.GetPage(i);
text += PdfTextExtractor.GetTextFromPage(page, strategy);
processed.Append(text);
}
text.Split('\n');
string line = "";
line = text + "&";
string[] newLines = line.Split('&');
textBox1.Text = newLines[0].Split(':')[1].ToString();
textBox2.Text = newLines[0].Split(':')[2].ToString();
Good Afternoon everyone. I have a question involving taking data in a tablelayoutpanel and placing it into a .pdf using iTextSharp (Unless someone knows a better technology). The tableLayout panel consists of 1 column with 1 row by default and has rows dynamically added given what the data returns.
Here is what I have for printing:
private void btnPrint_Click(object sender, EventArgs e)
{
try
{
SaveFileDialog dialog = new SaveFileDialog();
dialog.Title = "Save file as...";
dialog.Filter = "Pdf File |*.pdf";
if (dialog.ShowDialog() == DialogResult.OK)
{
Document doc = new Document(PageSize.LETTER);
PdfWriter writer = PdfWriter.GetInstance(doc, new FileStream(dialog.FileName, FileMode.Create));
doc.Open();
Paragraph entry1 = new Paragraph("Hello World!");
//Page 1 Printing
PdfPTable LegendsForTable = new PdfPTable(this.tblPnlLayLFT.ColumnCount);
doc.Add(entry1);
doc.Close();
MessageBox.Show("File saved");
}
}
catch (Exception exception)
{
MessageBox.Show(#"ERROR: Issue encountered while trying to print. " + Environment.NewLine
+ #"Contact ITSupport with the following the following error" + Environment.NewLine
+ exception);
}
}
Does anyone know a method to copy tablelayoutpanel to .pdf?
I was able to figure this out myself.
Step 1 was to create a loop that iterates through the table layout panels and place the order that I want into a list.
int reIterator = 1;
int replicateIterator = 1;
List<string> table1List = new List<string>();
for (int counter = 0; counter < 6; counter++)
{
while (reIterator < 7)
{
string currentLabel = "LblRE" + reIterator + "R" + replicateIterator;
Label reLabel = this.Controls.Find(currentLabel, true).FirstOrDefault() as Label;
if (reLabel.Text != null)
{
table1List.Add(reLabel.Text);
reIterator = reIterator + 1;
}
else
{
table1List.Add(reLabel.Text = "");
reIterator = reIterator + 1;
}
}
//Builds next row
if (reIterator == 7)
{
replicateIterator = replicateIterator + 1;
reIterator = 1;
}
}
Then using iTextSharp I am able to loop through using the list and add the data to a PDF.
I have finally, successfully, figured out how to fill a PDF with an XFA Form with my custom data using iTextSharp.
The problem is that I've lost the code that I had that let me make the XFA read-only. I have made the horrible mistake of changing my code before committing a working version to my source control. And now, after searching Google for like an hour I still can't find it :( If someone could remind me of the code that would be much appreciated.
PdfReader.unethicalreading = true;
PdfReader reader = new PdfReader(pdfFileName);
PdfStamper stamper = new PdfStamper(reader, ms);
XfaForm xfa = new XfaForm(reader);
XmlDocument doc = new XmlDocument();
doc.LoadXml(CreateXmaData(XDocument.Parse(xfa.DomDocument.InnerXml)));
xfa.DomDocument = doc;
xfa.Changed = true;
XfaForm.SetXfa(xfa, stamper.Reader, stamper.Writer);
PdfAction action = new PdfAction(PdfAction.PRINTDIALOG);
stamper.Writer.SetOpenAction(action);
// Somewhere here I had the code that made my XFA form read only...
stamper.Writer.CloseStream = false;
stamper.Close();
reader.Close();
byte[] buffer = new byte[ms.Position];
ms.Position = 0;
ms.Read(buffer, 0, buffer.Length);
return buffer;
Not sure if I was dreaming that I had the read-only working or what, and I doubt that this is the best way, but here is how I was finally able to do it:
...
doc.LoadXml(CreateXmaData(XDocument.Parse(xfa.DomDocument.InnerXml)));
PdfAction readOnlyAction = PdfAction
.JavaScript(MakeReadOnly(xfa.DomDocument.InnerXml), stamper.Writer);
stamper.Writer.AddJavaScript(readOnlyAction);
xfa.DomDocument = doc;
...
private string MakeReadOnly(string xml)
{
string formName = string.Empty;
int subFormStart = xml.IndexOf("<subform", 0);
if (subFormStart > -1)
{
int nameTagStart = xml.IndexOf("name", subFormStart);
int nameStart = xml.IndexOf("\"", nameTagStart);
int nameEnd = xml.IndexOf("\"", nameStart + 1);
formName = xml.Substring(nameStart + 1, (nameEnd - nameStart) - 1);
}
string readOnlyFunction = "ProcessAllFields(xfa.form." + formName + ");";
readOnlyFunction += "function ProcessAllFields(oNode) {";
readOnlyFunction += " if (oNode.className == \"exclGroup\" || oNode.className == \"subform\" || oNode.className == \"subformSet\" || oNode.className == \"area\") { ";
readOnlyFunction += " for (var i = 0; i < oNode.nodes.length; i++) {";
readOnlyFunction += " var oChildNode = oNode.nodes.item(i); ProcessAllFields(oChildNode);";
readOnlyFunction += " }";
readOnlyFunction += " } else if (oNode.className == \"field\") {";
readOnlyFunction += " oNode.access = \"readOnly\"";
readOnlyFunction += " }";
readOnlyFunction += "}";
return readOnlyFunction;
}
This worked for me
String script = "for (var nPageCount = 0; nPageCount < xfa.host.numPages; nPageCount++) { var oFields = xfa.layout.pageContent(nPageCount, \"subform\"); var nNodesLength = oFields.length;";
script += "for (var nNodeCount = 0; nNodeCount < nNodesLength; nNodeCount++) { oFields.item(nNodeCount).access = \"readOnly\"; } } ";
Trying to extract the textual content of a pdf with the following code:
PdfReader reader = new PdfReader(path);
string strText = string.Empty;
for (int page = 1; page <= reader.NumberOfPages; page++)
{
string s = PdfTextExtractor.GetTextFromPage(reader, page);
strText += " " + s;
}
reader.Close();
NumberOfPages returns 257, but at page 227, GetTextFromPage() throws a IndexOutOfRangeException.
Any help is appreciated.
hofnarwillie
I resolved this issue by updating my version of iTextSharp from 5.1 to 5.2.
first excuse me for my bad english!
I want to search in pdf document for a word like "Hello" . So I must read each page in pdf by PdfTextExtractor. I did it well. I can read all words in each page separately an save it in string buffer.
but when i push this code in For loop ,(for example from page 1 to 7 for search in it) earlier page's words will remain in string buffer.I hop you understand my problem.
Tanx all.
this is my code :
PdfReader reader2 = new PdfReader(openFileDialog1.FileName);
int pagen = reader2.NumberOfPages;
reader2.Close();
ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
for (int i = 1; i < pagen; i++)
{
textBox1.Text = "";
PdfReader reader = new PdfReader(openFileDialog1.FileName);
String s = PdfTextExtractor.GetTextFromPage(reader, i, its);
//MessageBox.Show(s.Length.ToString());
//PdfTextArray h = new PdfTextArray(s);
//
// s = "";
s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
textBox1.Text = s;
reader.Close();
}
SimpleTextExtractionStrategy doesn't let you reset it unfortunately, so you must move your "new SimpleTextExtractionStrategy()" inside the loop instead of reusing the same object.
There is another potential problem in the statement which controls your loop:
for (int i = 1; i < pagen; i++)
If pagen = 1, the loop is not executed at all. It should read:
for (int i = 1; i <= pagen; i++)
public string ReadPdfFile(object Filename,DataTable ReadLibray)
{
PdfReader reader2 = new PdfReader((string)Filename);
string strText = string.Empty;
for (int page = 1; page <= reader2.NumberOfPages; page++)
{
ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
PdfReader reader = new PdfReader((string)Filename);
String s = PdfTextExtractor.GetTextFromPage(reader, page, its);
s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
strText = strText + s;
reader.Close();
}
return strText;
}
This Code is very HelpFull to read PDf using itext