Ghostscript.NET image text quality issue - c#

I am attempting to convert a pdf document to images using ghostscript. The desired dpi is set to 72px which should be high enough for text to display clear but most of the text is illegible.
I can raise the dpi but that will cause very large image files which I would prefer not to have.
I know there are arguments for ghostscript to add anti aliasing etc (e.g. -dDOINTERPOLATE). How do I add them to the following piece of code, or is there a better way to do this?
int desired_x_dpi = 72;
int desired_y_dpi = 72;
GhostscriptRasterizer _rasterizer = new GhostscriptRasterizer();
_rasterizer.Open(inputPdfPath, localDllInfo, false);
for (int pageNumber = 1; pageNumber <= _rasterizer.PageCount; pageNumber++)
{
string pageFilePath = Path.Combine(outputPath, "Page-" + pageNumber.ToString() + ".png");
Image img = _rasterizer.GetPage(desired_x_dpi, desired_y_dpi, pageNumber);
img.Save(pageFilePath, ImageFormat.Png);
}

In 1.1.9 the GhostscriptRasterizer has -dDOINTERPOLATE set by default. The only parameters you can control via GhostscriptRasterizer class are TextAlphaBits and GraphicsAlphaBits.
I would recommend you to try to use other classes from the Ghostscript.NET if you want more control over the parameters.
Take a look at this samples: Image devices usage samples
You can add custom parameters (switches) this way:
GhostscriptPngDevice dev = new GhostscriptPngDevice(GhostscriptPngDeviceType.Png16m);
dev.GraphicsAlphaBits = GhostscriptImageDeviceAlphaBits.V_4;
dev.TextAlphaBits = GhostscriptImageDeviceAlphaBits.V_4;
dev.ResolutionXY = new GhostscriptImageDeviceResolution(96, 96);
dev.InputFiles.Add(#"E:\gss_test\indispensable.pdf");
dev.Pdf.FirstPage = 2;
dev.Pdf.LastPage = 4;
dev.CustomSwitches.Add("-dDOINTERPOLATE"); // custom parameter
dev.OutputPath = #"E:\gss_test\output\indispensable_color_page_%03d.png";
dev.Process();
When I catch some time, I will extend GhostscriptRasterizer to accept custom parameters in the Open method for the Ghostscript.NET v.1.2.0 release.

Got same problem. Fixed by adding CustomSwitches with resolution to GhostscriptRasterizer:
using (var rasterizer = new GhostscriptRasterizer())
{
rasterizer.CustomSwitches.Add("-r500x500");
...other code here
}

Related

Make every page of a PDF into an image (contained in a pdf) in C#

I have a lot of PDF files with text. To prevent copying, I added a watermark, however, the watermark is easily removable simply by editing the PDF.
Using C#, how can I convert a PDF into a PDF with each page being an image of the text? I understand this isn't foolproof, as OCR can be used to extract the text, but I want to make it that little bit harder.
Thanks for your help.
I used Ghostscript.Net (https://github.com/jhabjan/Ghostscript.NET) to break up each page into a bitmap which you can
convert into any other format you want:
using Ghostscript.NET.Rasterizer;
...
using (GhostscriptRasterizer raster = new GhostscriptRasterizer())
{
raster.Open(filename);
pages = raster.PageCount;
_bitpages = new Bitmap[raster.PageCount];
for (int i = 1; i < pages + 1; i++)
{
_bitpages[i - 1] = (Bitmap)raster.GetPage(dpi, dpi, i);
// convert and save image here
}
raster.Close();
}

How to use IronOCR for reading from a directory instead of a single file?

I am using IronOCR OCR Library to read text from a particular area within a PNG image which goes fine. However, I need to do the same with a directory/folder and save it to a directory/folder.
Here's the code that I am currently using
public void workOCR()
{
var ocr = new AutoOcr();
var Area = new Rectangle()
{
X = 1367,
Y = 420,
Height = 57,
Width = 411
};
var result = ocr.Read(#"C:\Users\fasihullahkhan\Desktop\pdftopng\page3.png", Area);
label1.Text = result.Text;
}
What I want to do is specify a directory like this:
ocr.Read(#"C:\Users\fasihullahkhan\Desktop\pdftopng\", Area);
From within a loop, but I don't know how. Please help.

Image from resources as a footer in excel

How can I use an image from resources as a footer in created excel file?
this definately will not work:
xlWorkSheet.PageSetup.CenterFooterPicture = Properties.Resources.stopka;
since:
Cannot implicitly convert type 'System.Drawing.Bitmap' to Microsoft.Office.Interop.Excel.Graphic'
Ok this works:
xlWorkSheet.PageSetup.CenterFooterPicture.Filename = Application.StartupPath + "\\stopka.png";
xlWorkSheet.PageSetup.CenterFooterPicture.LockAspectRatio = Microsoft.Office.Core.MsoTriState.msoTrue;
xlWorkSheet.PageSetup.CenterFooterPicture.Width = 590;
xlWorkSheet.PageSetup.CenterFooter = "&G";
But it is not what I needed. I would like to get the image from project resources not from application folder.
This works:
System.Reflection.Assembly CurrAssembly = System.Reflection.Assembly.LoadFrom(System.Windows.Forms.Application.ExecutablePath);
System.IO.Stream stream = CurrAssembly.GetManifestResourceStream("Oferty_BMGRP.Resources.stopka.png");
string temp = Path.GetTempFileName();
System.Drawing.Image.FromStream(stream).Save(temp);
xlWorkSheet.PageSetup.CenterFooterPicture.Filename = temp; //Application.StartupPath + "\\Resources\\stopka.png";
xlWorkSheet.PageSetup.CenterFooterPicture.LockAspectRatio = Microsoft.Office.Core.MsoTriState.msoTrue;
xlWorkSheet.PageSetup.CenterFooterPicture.Width = 590;
xlWorkSheet.PageSetup.CenterFooter = "&G";
the image "stopka.png" had to be set as embeded resource.
Since microsoft doesn't provide the set option for CenterFooterPicture property
The following is the documentation available in MSDN
CenterFooterPicture - Returns a Graphic object that represents the picture for the center section of the footer. Used to set attributes about the picture.
Link to Refer

ImageResizer - not resaving image if it's smaller than requested size

OK, I am trying to use ImageResizer component in my web app. I have following code:
var versions = new Dictionary<string, string>();
//Define the versions to generate
versions.Add("_001", "maxwidth=300&maxheight=300&format=jpg");
versions.Add("_002", "maxwidth=600&maxheight=600&format=jpg");
versions.Add("_003", "maxwidth=1920&maxheight=1080&format=jpg&process=no"); // I expect it not to resave the image if original is smaller
string uploadFolder = "...my folder path...";
if (!Directory.Exists(uploadFolder))
Directory.CreateDirectory(uploadFolder);
//Generate each version
foreach (string suffix in versions.Keys)
{
//Generate a filename (GUIDs are best).
string fileName = Path.Combine(uploadFolder, DEFAULT_IMAGE_NAME + suffix);
//Let the image builder add the correct extension based on the output file type
fileName = ImageBuilder.Current.Build(file, fileName, new ResizeSettings(versions[suffix]), false, true);
}
file.SaveAs(uploadFolder + DEFAULT_IMAGE_NAME + "_000.jpg");
As you can tell I am saving 3 versions of one image + original image. However, I only want image to be re-encoded and re-saved if resizing is required. So if I upload 1000x1000 image I would expect that main_000.jpg and main_003.jpg are the same. However, that's not the case (ImageResizer resizes that image also, and often saved file size is bigger than main_000.jpg).
I tried adding process=no as parameter but it's not working. Anyone knows if this scenario is supported and which parameter I need to add?
//it may need to be improved
Dictionary<string, SavingSettings> SaveVersions = new Dictionary<string, SavingSettings>();
public void page_load(object sender, EventArgs e) {
//set versions:
SaveVersions.Add("xxl", new SavingSettings("xxl", new ImageResizer.ResizeSettings())); //original size
SaveVersions.Add("600px", new SavingSettings("600px", new ImageResizer.ResizeSettings(600, 600, ImageResizer.FitMode.Max, "jpg"))); //big
SaveVersions.Add("80px", new SavingSettings("80px", new ImageResizer.ResizeSettings(80, 80, ImageResizer.FitMode.Max, "jpg"))); //80 px thumb
SaveVersions.Add("260w", new SavingSettings("260w", new ImageResizer.ResizeSettings(260, 0, ImageResizer.FitMode.Max, "jpg"))); //260 px width thumb
}
public void SaveIt(string SourceFile,string TargetFileName) {
using(System.Drawing.Bitmap bmp = ImageResizer.ImageBuilder.Current.LoadImage(SourceFile, new ImageResizer.ResizeSettings())) {
foreach(System.Collections.Generic.KeyValuePair<string, SavingSettings> k in SaveVersions) {
string TargetFilePath = Server.MapPath("../img/" + k.Value.VersionName + "/" + TargetFileName + ".jpg");
string TargetFolder = Server.MapPath("../img/" + k.Value.VersionName);
if(!System.IO.Directory.Exists(TargetFolder)) System.IO.Directory.CreateDirectory(TargetFolder);
if(bmp.Width > k.Value.ResizeSetting.Width || bmp.Height > k.Value.ResizeSetting.Height) {
//you may need to resize
ImageResizer.ImageBuilder.Current.Build(bmp, TargetFilePath, k.Value.ResizeSetting, false);
} else {
//just copy it
//or in your example you can save uploaded file
System.IO.File.Copy(SourceFile, TargetFilePath);
}
}
}
}
struct SavingSettings {
public string VersionName;
public ImageResizer.ResizeSettings ResizeSetting;
public SavingSettings(string VersionName, ImageResizer.ResizeSettings ResizeSetting) {
this.VersionName = VersionName;
this.ResizeSetting = ResizeSetting;
}
}
You need to use the URL API, not the Managed API, to perform dynamic image resizing.
Just get rid of the pre-resizing code, and save the upload to disk (make sure you sanitize the filename or use a GUID instead, however).
Then, use the URL API like this:
<img src="/uploads/original.jpg?maxwidth=300&maxheight=300&format=jpg" />

XPS Print Quality C# vs. XPS viewer

I'm having a somewhat odd print quality problem in my C# application. I have an XPS file (it's basically just a 1 page image, that was originally a scanned black and white image) that I'm trying to print to an IBM InfoPrint Mainframe driver via a C# application. I've printed to numerous other print drivers and never had a problem, but this driver gives me terrible quality with the AFP file it creates. If I open the same file in the Microsoft XPS viewer application and print to the same driver, the quality looks fine.
Trying to work though the problem I've tried 3 or 4 different approaches to printing in the C# app. The original code did something like this (trimmed for brevity):
System.Windows.Xps.XpsDocumentWriter writer = PrintQueue.CreateXpsDocumentWriter(mPrintQueue);
mCollator = writer.CreateVisualsCollator();
mCollator.BeginBatchWrite();
ContainerVisual v = getContainerVisual(xpsFilePath);
//tried all sorts of different options on the print ticket, no effect
mCollator.Write(v,mDefaultTicket);
That code (which I've truncated) certainly could have had some weird issues in it, so I tried something much simpler:
LocalPrintServer localPrintServer = new LocalPrintServer();
PrintQueue defaultPrintQueue = LocalPrintServer.GetDefaultPrintQueue();
PrintSystemJobInfo xpsPrintJob = defaultPrintQueue.AddJob("title", xpsDocPath, false);
Same results.
I even tried using the WCF print dialog, same poor quality (http://msdn.microsoft.com/en-us/library/ms742418.aspx).
One area I haven't tried yet, is using the old-school underlying print API's, but I'm not sure why that would behave differently. One other option I have, is my original document is a PDF, and I have a good 3rd party library that can make me an EMF file instead. However, every time I try to stream that EMF file to my printer, I get garbled text.
Any ideas on why this quality is lost, how to fix, or how to stream an EMF file to a print driver, would be much appreciated!
UPDATE:
One other note. This nice sample app: http://wrb.home.xs4all.nl/Articles_2010/Article_XPSViewer_01.htm experiences the same quality loss. I've also now performed tests where I open the PDF directly and render the Bitmaps to a Print Document, same fuzziness of the resulting images. If I open the PDFs in Acrobat and print they look fine.
So to close this issue, it seems that the IBM Infoprint driver (at least the way it's being used here) has quite different quality depending on how you print in C#.
In this question I was using:
System.Windows.Documents.Serialization.Write(Visual, PrintTicket);
I completely changed my approach, removing XPS entirely, and obtained an emf (windows metafile) rendition of my document, then sent that emf file to the Windows printer using the windows print event handler:
using (PrintDocument pd = new PrintDocument())
{
pd.DocumentName = this.mJobName;
pd.PrinterSettings.PrinterName = this.mPrinterName;
pd.PrintController = new StandardPrintController();
pd.PrintPage += new PrintPageEventHandler(DoPrintPage);
pd.Print();
}
(I've obviously omitted a lot of code here, but you can find examples of how to use this approach relatively easily)
In my testing, most print drivers were equally happy with either printing approach, but the IBM Infoprint driver was EXTREMELY sensitive to the quality. One possible explanation is that the Infoprint printer was required to be configured with a weird fixed DPI and it may be doing a relatively poor job converting.
EDIT: More detailed sample code was requested, so here ya go. Note that getting an EMF file is a pre-req for this approach. In this case I'm using ABC PDF, which lets you generate an EMF file from your PDF with a relatively simple call.
class AbcPrintEmf
{
private Doc mDoc;
private string mJobName;
private string mPrinterName;
private string mTempFilePath;
private bool mRenderTextAsPolygon;
public AbcPdfPrinterApproach(Doc printMe, string jobName, string printerName, bool debug, string tempFilePath, bool renderTextAsPolygon)
{
mDoc = printMe;
mDoc.PageNumber = 1;
mJobName = jobName;
mPrinterName = printerName;
mRenderTextAsPolygon = renderTextAsPolygon;
if (debug)
mTempFilePath = tempFilePath;
}
public void print()
{
using (PrintDocument pd = new PrintDocument())
{
pd.DocumentName = this.mJobName;
pd.PrinterSettings.PrinterName = this.mPrinterName;
pd.PrintController = new StandardPrintController();
pd.PrintPage += new PrintPageEventHandler(DoPrintPage);
pd.Print();
}
}
private void DoPrintPage(object sender, PrintPageEventArgs e)
{
using (Graphics g = e.Graphics)
{
if (mDoc.PageCount == 0) return;
if (mDoc.Page == 0) return;
XRect cropBox = mDoc.CropBox;
double srcWidth = (cropBox.Width / 72) * 100;
double srcHeight = (cropBox.Height / 72) * 100;
double pageWidth = e.PageBounds.Width;
double pageHeight = e.PageBounds.Height;
double marginX = e.PageSettings.HardMarginX;
double marginY = e.PageSettings.HardMarginY;
double dstWidth = pageWidth - (marginX * 2);
double dstHeight = pageHeight - (marginY * 2);
// if source bigger than destination then scale
if ((srcWidth > dstWidth) || (srcHeight > dstHeight))
{
double sx = dstWidth / srcWidth;
double sy = dstHeight / srcHeight;
double s = Math.Min(sx, sy);
srcWidth *= s;
srcHeight *= s;
}
// now center
double x = (pageWidth - srcWidth) / 2;
double y = (pageHeight - srcHeight) / 2;
// save state
RectangleF theRect = new RectangleF((float)x, (float)y, (float)srcWidth, (float)srcHeight);
int theRez = e.PageSettings.PrinterResolution.X;
// draw content
mDoc.Rect.SetRect(cropBox);
mDoc.Rendering.DotsPerInch = theRez;
mDoc.Rendering.ColorSpace = "RGB";
mDoc.Rendering.BitsPerChannel = 8;
if (mRenderTextAsPolygon)
{
//i.e. render text as polygon (non default)
mDoc.SetInfo(0, "RenderTextAsText", "0");
}
byte[] theData = mDoc.Rendering.GetData(".emf");
if (mTempFilePath != null)
{
File.WriteAllBytes(mTempFilePath + #"\" + mDoc.PageNumber + ".emf", theData);
}
using (MemoryStream theStream = new MemoryStream(theData))
{
using (Metafile theEMF = new Metafile(theStream))
{
g.DrawImage(theEMF, theRect);
}
}
e.HasMorePages = mDoc.PageNumber < mDoc.PageCount;
if (!e.HasMorePages) return;
//increment to next page, corrupted PDF's have occasionally failed to increment
//which would otherwise put us in a spooling infinite loop, which is bad, so this check avoids it
int oldPageNumber = mDoc.PageNumber;
++mDoc.PageNumber;
int newPageNumber = mDoc.PageNumber;
if ((oldPageNumber + 1) != newPageNumber)
{
throw new Exception("PDF cannot be printed as it is corrupt, pageNumbers will not increment properly.");
}
}
}
}

Categories

Resources