I'm having a go with document viewer and XPS atm as I haven't tried it before. So I have a simple piece of code loading an XPS document and displaying it in the document viewer, however the document doesn't appear. The document viewer loads and a quick step through in debug mode tells me the information is there,it just won't show.
dvDoc = new DocumentViewer();
string fileName = null;
string appPath = System.IO.Path.GetDirectoryName(Assembly.GetAssembly(typeof(DocumentWindow)).CodeBase);
if (type == "About")
fileName = appPath + #"\Documents\About.xps";
fileName = fileName.Remove(0, 6);
XpsDocument doc = new XpsDocument(fileName, FileAccess.Read);
dvDoc.Document = doc.GetFixedDocumentSequence();
All literature I can find tells me to do it this way yet it doesn't seem to work for me. I'm aware that document viewer doesn't like URI's, hence the filename.remove line.
Any suggestions as to what I'm missing.
Cheers,
SumGuy
You've probably already figured this out by now since it's been almost a month.
It doesn't look like your document viewer is part of your xaml file. It looks like you are creating a new DocumentViewer object, but never adding it to the xaml file.
Instead of
dvDoc = new DocumentViewer();
Declare it in your xaml file:
<DocumentViewer Name="dvDoc" />
Related
I have been working on getting a PDF Viewer working in Net MAUI. In Xamarin, I displayed a PDF in Webview. Not a problem. In MAUI, you don't do that. I installed SpirePDF(free) and have it loading the PDF from file. When adding it to VerticalStackLayout (which will be the Content for a ScrollView), it fails on conversion to IView. Any ideas or suggestions?
VerticalStackLayout vsl = new VerticalStackLayout(); PdfDocument viewPdf = new PdfDocument(); Assembly.GetExecutingAssembly().GetType().GetTypeInfo().Assembly.GetManifestResourceStream("CommanderGrady.Resources.Images.adventure218.pdf"); viewPdf.LoadFromFile(#"C:\Users\leuol\source\repos\CommanderGrady\CommanderGrady\Resources\Images\adventure218.pdf"); vsl.Add((IView)viewPdf); return vsl; }
I have done a sample and the pdf can show in the browser. So you can try the following code.
string fullpath = xxx; // provide the file path
await Launcher.OpenAsync(new OpenFileRequest
{
File = new ReadOnlyFile(fullpath)
});
I'm trying to use IronOCR to create text-searchable versions of scanned PDF documents. The outputted file is displayed properly (and is text-selectable) in pretty much every viewer, except for Chrome's built-in PDF viewer.
Here's my code for converting the files:
byte[] origPdfBytes = Properties.Resources.Non_text_searchable;
using (MemoryStream pdfStream = new MemoryStream(origPdfBytes))
{
var ocr = new IronTesseract();
using (OcrInput input = new OcrInput())
{
input.AddPdf(pdfStream);
OcrResult ocrResult = ocr.Read(input);
ocrResult.SaveAsSearchablePdf("C:\\temp\\OCRTest\\output.pdf");
}
}
Here is a sample file that I've converted using IronOCR: https://drive.google.com/file/d/1_uhmZKJN_TFStApfeeieI8LezAPjp1mj/view
If you download and view this file in pretty much any viewer other than Chrome, the text is properly selectable. However, in Chrome, the cursor does appear to be selecting text, but it does not display properly.
I've Chrome's built-in PDF viewer for years, and I've never run into an issue like this. I'm not sure if this is an issue with IronOCR's output formatting, or if it's just a problem with Chrome. Any ideas?
How can I read pdf files and save contents to a text file using Spire.PDF?
For example: Here is a pdf file and here is the desired text file from that pdf
I tried the below code to read the file and save it to a text file
PdfDocument doc = new PdfDocument();
doc.LoadFromFile(#"C:\Users\Tamal\Desktop\101395a.pdf");
StringBuilder buffer = new StringBuilder();
foreach (PdfPageBase page in doc.Pages)
{
buffer.Append(page.ExtractText());
}
doc.Close();
String fileName = #"C:\Users\Tamal\Desktop\101395a.txt";
File.WriteAllText(fileName, buffer.ToString());
System.Diagnostics.Process.Start(fileName);
But the output text file is not properly formatted. It has unnecessary whitespaces and a complete para is broken into multiple lines etc.
How do I get the desired result as in the desired text file?
Additionally, it is possible to detect and mark(like add a tag) to texts with bold, italic or underline forms as well? Also things get more problematic for pages have multiple columns of text.
Using iText
File inputFile = new File("input.pdf");
PdfDocument pdfDocument = new PdfDocument(new PdfReader(inputFile));
SimpleTextExtractionStrategy stes = new SimpleTextExtractionStrategy();
PdfCanvasProcessor canvasProcessor = new PdfCanvasProcessor(stes);
canvasProcessor.processPageContent(pdfDocument.getPage(1));
System.out.println(stes.getResultantText());
This is (as the code says) a basic/simple text extraction strategy.
More advanced examples can be found in the documentation.
Use IronOCR
var Ocr = new IronOcr.AutoOcr();
var Results = Ocr.ReadPdf("E:\Demo.pdf");
File.WriteAllText("E:\Demo.txt", Convert.ToString(Results));
For reference https://ironsoftware.com/csharp/ocr/
Using this you should get formatted text output, but not exact desire output which you want.
If you want exact pre-interpreted output, then you should check paid OCR services like OmniPage capture SDK & Abbyy finereader SDK
That is the nature of PDF. It basically says "go to this location on a page and place this character there." I'm not familiar at all with Spire.PFF; I work with Java and the PDFBox library, but any attempt to extract text from PDF is heuristic and hence imperfect. This is a problem that has received considerable attention and some applications have better results than others, so you may want to survey all available options. Still, I think you'll have to clean up the result.
After struggling whole day, I identified the issue but this didn't solve my problem.
On short:
I need to open a PDF, convert to BW (grayscale), search some words and insert some notes nearby found words. At a first look it seems easy but I discovered how hard PDF files are processed (having no "words" concepts and so on).
Now the first task, converting to grayscale just drove me crazy. I didn't find a working solution either commercial or free. I came up with this solution:
open the PDF
print with windows drivers, some free PDF printers
This is quite ugly since I will force the C# users to install such 3'rd party SW but.. that is fpr the moment. I tested FreePDF, CutePDF and PDFCreator. All of them are working "stand alone" as expected.
Now when I tried to print from C#, obviously, I don't want the print dialog, just select BW option and print (aka. convert)
The following code just uses a PDF library, shown for clarity only.
Aspose.Pdf.Facades.PdfViewer viewer = new Aspose.Pdf.Facades.PdfViewer();
viewer.BindPdf(txtPDF.Text);
viewer.PrintAsGrayscale = true;
//viewer.RenderingOptions = new RenderingOptions { UseNewImagingEngine = true };
//Set attributes for printing
//viewer.AutoResize = true; //Print the file with adjusted size
//viewer.AutoRotate = true; //Print the file with adjusted rotation
viewer.PrintPageDialog = true; //Do not produce the page number dialog when printing
////PrinterJob printJob = PrinterJob.getPrinterJob();
//Create objects for printer and page settings and PrintDocument
System.Drawing.Printing.PrinterSettings ps = new System.Drawing.Printing.PrinterSettings();
System.Drawing.Printing.PageSettings pgs = new System.Drawing.Printing.PageSettings();
//System.Drawing.Printing.PrintDocument prtdoc = new System.Drawing.Printing.PrintDocument();
//prtdoc.PrinterSettings = ps;
//Set printer name
//ps.PrinterName = prtdoc.PrinterSettings.PrinterName;
ps.PrinterName = "CutePDF Writer";
ps.PrintToFile = true;
ps.PrintFileName = #"test.pdf";
//
//ps.
//Set PageSize (if required)
//pgs.PaperSize = new System.Drawing.Printing.PaperSize("A4", 827, 1169);
//Set PageMargins (if required)
//pgs.Margins = new System.Drawing.Printing.Margins(0, 0, 0, 0);
//Print document using printer and page settings
viewer.PrintDocumentWithSettings(ps);
//viewer.PrintDocument();
//Close the PDF file after priting
What I discovered and seems to be little explained, is that if you select
ps.PrintToFile = true;
no matter C# PDF library or PDF printer driver, Windows will just skip the PDF drivers and instead of PDF files will output PS (postscript) ones which obviously, will not be recognized by Adobe Reader.
Now the question (and I am positive that others who may want to print PDFs from C# may be encountered) is how to print to CutePDF for example and still suppress any filename dialog?
In other words, just print silently with programmatically selected filename from C# application. Or somehow convince "print to file" to go through PDF driver, not Windows default PS driver.
Thanks very much for any hints.
I solved conversion to grayscale with a commercial component with this post and I also posted there my complete solution, in care anyone will struggle like me.
Converting PDF to Grayscale pdf using ABC PDF
I am trying to set the page margins on an excel spreadsheet that is being generated by open xml sdk. I am not opening an excel document that already exists, it is generated from scratch. I am using the PageMargins class but am not sure how to attach this instance to the worksheet. The SDK productivity tool gives this code:
PageMargins pageMargins1 = worksheet.GetFirstChild<PageMargins>();
pageMargins1.Left = 0.45D;
pageMargins1.Right = 0.45D;
pageMargins1.Top = 0.5D;
pageMargins1.Bottom = 0.5D;
The GetFirstChild() function returns null. I also tried to do
worksheet.Append(pageMargins1);
but no luck.
Also using the code from this example: How change excel 2007 document orientation to landscape by OpenXML sdk
to set the page orientation does not work if creating the document from scratch. How do you add a PageSetup & PageMargin instance to the document?
Any one have knowledge of this SDK and knows how to use the margins or page setup class?
The order that things are appended to the worksheet is important. PageMargins needs to be appended before PageSetup, and they both need to be at the end after the SheetData. Also, all settings need to be set. I used this code:
PageMargins pageMargins1 = new PageMargins();
pageMargins1.Left = 0.45D;
pageMargins1.Right = 0.45D;
pageMargins1.Top = 0.5D;
pageMargins1.Bottom = 0.5D;
pageMargins1.Header = 0.3D;
pageMargins1.Footer = 0.3D;
worksheetPart.Worksheet.AppendChild(pageMargins1);
PageSetup pageSetup = new PageSetup();
pageSetup.Orientation = OrientationValues.Landscape;
pageSetup.FitToHeight = 2;
pageSetup.HorizontalDpi = 200;
pageSetup.VerticalDpi = 200;
worksheetPart.Worksheet.AppendChild(pageSetup);
A neat trick to be used along with the sdk productivity tool is to rename the .xlsx file to .zip, then extract the contents. Then open /xl/worksheets/sheet.xml and compare the markup to the markup of an excel file created by excel.