I'm looking for a solution, either paid or free...
I have superscript text stored in SQL in RTF format, I need to print the superscript on a PDF document along with other text. So for example the PDF doc might read "Equation 1:" and then print the superscript text extracted from SQL.
I have been searching for an easy way to do this and so far come up empty.
The current PDF docs are made with PDFSharp but i'm happy to change that for a workable solution.
I thought of converting the rtf to an image with PdfConverter and then placing the image on the pdf doc but that doesn't seem to work.
I tried to do this with the following code however it throws an error "Parameter is not valid".
PdfConverter pdfConverter = new PdfConverter();
byte[] rtfstring = pdfConverter.GetPdfBytesFromRtfString(spec);
ImageConverter conv = new ImageConverter();
Image i = (Image)conv.ConvertFrom(rtfstring);
Related
Having an issue with producing a PDF with an image in IText.
We're able to produce the text of the document using IText, but it is not pulling through the image.
We are following the text in their Ebook at https://kb.itextpdf.com/home/it7kb/ebooks/itext-7-converting-html-to-pdf-with-pdfhtml/chapter-1-hello-html-to-pdf
The code in question is below:
void createPdf(string baseUri, string html, string dest)
{
ConverterProperties properties = new ConverterProperties();
properties.SetBaseUri(baseUri);
HtmlConverter.ConvertToPdf(html, new FileStream(dest, FileMode.Create), properties);
}
And as far as we can see the issue is around the string baseUri
We have assumed that this is the directory where the image is held in our C# project in visual studio and have so far used the following to no avail as a string:
/Images/
/Images/NewLogo.png
http://localhost:64070/Images/NewLogo.png
None of these have produced the image in the PDF and any help or suggestions would be greatly appreciated.
We have found that if we set the BaseUri to the location of an image on a url that we are able to produce a Image on a PDF
I used Itext7 in my C# code to create a pdf file, as I said in my other question here
Itext7 not showing arabic text
so I gave up on trying to fix it, because it seems like I need to pay for the addon, and I can't do that
I tried Pdf sharp, it showed arabic letters but there were disconnected and reversed, and writing arabic backward did not make the letters connect
I used SautinSoft library and it created a word document where arabic works fine, but it has a footer that says that it is a free version, so i can't use this one either
the pdf created by this library also doesnt support arabic
so I think I can't write pdf in arabic, all libraries I tried didn't supported it
is there anyway to fix it?
or can anyone please suggest another library that can create arabic pdf or a word document without watermarks or footers
I found the solution, using Gembox pdf, it only allows 20 paragraphs, but that is more than enough
What if DocumentCore?
public static void SecureDocument()
{
string filePath = #"ProtectedDocument.pdf";
DocumentCore dc = new DocumentCore();
// Let's create a simple document.
dc.Content.End.Insert("Hello World!!!", new CharacterFormat() { FontName = "Verdana", Size = 65.5f, FontColor = Color.Orange });
PdfSaveOptions so = new PdfSaveOptions();
// Password Protection
so.EncryptionDetails.UserPassword = "12345";
// EncryptionAlgorithm
so.EncryptionDetails.EncryptionAlgorithm = PdfEncryptionAlgorithm.RC4_128;
//Permissions: Content Copying, Commenting, Printing, Changing the Document, filing of form fildes
//Printing: Allowed
so.EncryptionDetails.Permissions = PdfPermissions.Printing;
// Save a document as the PDF file with Security Options.
dc.Save(filePath, so);
// Open the result for demonstration purposes.
System.Diagnostics.Process.Start(new System.Diagnostics.ProcessStartInfo(filePath) { UseShellExecute = true });
}
How can I read pdf files and save contents to a text file using Spire.PDF?
For example: Here is a pdf file and here is the desired text file from that pdf
I tried the below code to read the file and save it to a text file
PdfDocument doc = new PdfDocument();
doc.LoadFromFile(#"C:\Users\Tamal\Desktop\101395a.pdf");
StringBuilder buffer = new StringBuilder();
foreach (PdfPageBase page in doc.Pages)
{
buffer.Append(page.ExtractText());
}
doc.Close();
String fileName = #"C:\Users\Tamal\Desktop\101395a.txt";
File.WriteAllText(fileName, buffer.ToString());
System.Diagnostics.Process.Start(fileName);
But the output text file is not properly formatted. It has unnecessary whitespaces and a complete para is broken into multiple lines etc.
How do I get the desired result as in the desired text file?
Additionally, it is possible to detect and mark(like add a tag) to texts with bold, italic or underline forms as well? Also things get more problematic for pages have multiple columns of text.
Using iText
File inputFile = new File("input.pdf");
PdfDocument pdfDocument = new PdfDocument(new PdfReader(inputFile));
SimpleTextExtractionStrategy stes = new SimpleTextExtractionStrategy();
PdfCanvasProcessor canvasProcessor = new PdfCanvasProcessor(stes);
canvasProcessor.processPageContent(pdfDocument.getPage(1));
System.out.println(stes.getResultantText());
This is (as the code says) a basic/simple text extraction strategy.
More advanced examples can be found in the documentation.
Use IronOCR
var Ocr = new IronOcr.AutoOcr();
var Results = Ocr.ReadPdf("E:\Demo.pdf");
File.WriteAllText("E:\Demo.txt", Convert.ToString(Results));
For reference https://ironsoftware.com/csharp/ocr/
Using this you should get formatted text output, but not exact desire output which you want.
If you want exact pre-interpreted output, then you should check paid OCR services like OmniPage capture SDK & Abbyy finereader SDK
That is the nature of PDF. It basically says "go to this location on a page and place this character there." I'm not familiar at all with Spire.PFF; I work with Java and the PDFBox library, but any attempt to extract text from PDF is heuristic and hence imperfect. This is a problem that has received considerable attention and some applications have better results than others, so you may want to survey all available options. Still, I think you'll have to clean up the result.
I'm new to stack overflow, C# and onenote interop com api. I'm trying to display a pdf file in onenote using C# and the onenote com/interop api (I'd rather not use the REST API).
I am able to display a link to a pdf file using the tag < InsertedFile pathSource="[myfilepath]" preferredName = "[myPreferredName]"> in conjunction with the UpdatePageContent function in the interop API, but this doesn't display the PDF.
I have been able to get my program to display an image in onenote using the following code to create the image tag
private XElement createImageTag(Image image)
{
string OneNoteNamespace = "http://schemas.microsoft.com/office/onenote/2013/onenote";
var img = new XElement(XName.Get("Image", OneNoteNamespace));
var data = new XElement(XName.Get("Data", OneNoteNamespace));
data.Value = this.toBase64(image);
img.Add(data);
return img;
}
private string toBase64(Image image)
{
using (var memoryStream = new MemoryStream())
{
image.Save(memoryStream, ImageFormat.Png);
var binary = memoryStream.ToArray();
return Convert.ToBase64String(binary);
}
}
I tried altering this for a pdf instead of am image by converting a pdf to a byte array then converting it to base64 and assigning the result as data.Value in the createImageTag function but it did not result in a displayed pdf either (presumably because onenote was expecting an image and not a pdf). I'd like to avoid using third party libraries or extensions to convert a pdf to an image if possible, and haven't found any other ways to convert a pdf to an image.
I used ONOMSpy to look for any other onenote/xml tags that might help me display a pdf in onenote, but didn't see others besides the Image and InsertedFile tags that looked like they were close to doing what I wanted.
so if you could help me either :
1) find an easy way to convert a pdf to an image using C# or
2) show me how to tell onenote to display the PDF
I'd really appreciate it. Thanks!
currently i have been using the following code and i am using some dll files from pdfbox
FileInfo file = new FileInfo("c://aa.pdf");
PDDocument doc = PDDocument.load(file.FullName);
PDFTextStripper pdfStripper = new PDFTextStripper();
string text = pdfStripper.getText (doc);
richTextBox1.Text = qq;
using this code i can able to get text file but not in a correct format plz give me a some ideas
Extracting the text from a pdf file is anything but trivial.
To quote from th iTextSharp tutorial.
"The pdf format is just a canvas where
text and graphics are placed without
any structure information. As such
there aren't any 'iText-objects' in a
PDF file. In each page there will
probably be a number of 'Strings', but
you can't reconstruct a phrase or a
paragraph using these strings. There
are probably a number of lines drawn,
but you can't retrieve a Table-object
based on these lines. In short:
parsing the content of a PDF-file is
NOT POSSIBLE with iText."
There are several commercial applications which claim to be able to do it. Caveat Emptor.
There is also a free software library called Poppler http://poppler.freedesktop.org/ which is used by the pdf viewers of GNOME and KDE. It has a function called pdftotext() but I have no experience with it. It may be your best free option.
There is a blog article explaining the issues with PDF text extraction in general at http://pdf.jpedal.org/java-pdf-blog/bid/12670/PDF-text