Text missing when converting a PDF to PNG using Magick.NET

Text missing when converting a PDF to PNG using Magick.NET - c#

I have an MVC application that is uploading a PDF file and rendering each page as single PNG image using Magick.NET. The conversion is fine in most cases, but in some I am getting a blank image where text should be and other lines of text displaying correctly in the same image. Does anyone know what could be causing this?
Below is the code I'm using.
public FileResult PNGPreview(Guid id, Int32 index)
{
MagickReadSettings settings = new MagickReadSettings();
// Settings the density to 300 dpi will create an image with a better quality
settings.FrameIndex = index;
settings.FrameCount = 1;
settings.Density = new PointD(300, 300);
settings.UseMonochrome = true;
using (MagickImageCollection images = new MagickImageCollection())
{
// Add all the pages of the pdf file to the collection
images.Read(CreateDocument(id), settings);
using (MemoryStream stream = new MemoryStream())
{
images[0].Write(stream, MagickFormat.Png24);
stream.Close();
byte[] result = stream.ToArray();
return File(result, "image/png");
}
}
}
private byte[] CreateDocument(Guid id)
{
PdfReader reader = new PdfReader(Server.MapPath(String.Format("~/documenttemplates/{0}.pdf", id)));
byte[] result = null;
using (MemoryStream ms = new MemoryStream())
{
PdfStamper stamper = new PdfStamper(reader, ms, '\0', false);
stamper.Close();
reader.Close();
result = ms.ToArray();
}
return result;
}

The PDF file that caused this issue was provided to me by e-mail and I was told that this file was created with Word and then edited with Foxit Pro.
Magick.NET uses Ghostscript to convert the PDF file to an image. A command similar to the one below is executed.
"c:\Program Files (x86)\gs\gs9.16\bin\gswin32c.exe" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE
-dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=pnggray"
-dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" "-sOutputFile=Test.%d.png" "-fTest.pdf"
And that will tell us that the file that was created is corrupt.
**** Error reading a content stream. The page may be incomplete.
**** File did not complete the page properly and may be damaged.
**** Error reading a content stream. The page may be incomplete.
**** File did not complete the page properly and may be damaged.
**** This file had errors that were repaired or ignored.
**** The file was produced by:
**** >>>> Microsoft? Word 2013 <<<<
**** Please notify the author of the software that produced this
**** file that it does not conform to Adobe's published PDF
**** specification.
This can be solved by creating the input file with a different program.

Related

Google drive upload an image from memory stream

Good day,
I have this code which upload an image to Google drive from file, everything works well:
// Create a new file on Google Drive
using (var fsSource = new FileStream(UploadFileName, FileMode.Open, FileAccess.Read))
{
// Create a new file, with metadata and stream.
var request = service.Files.Create(fileMetadata, fsSource, "image/jpg");
request.Fields = "*";
var results = await request.UploadAsync(CancellationToken.None);
}
Now I want to do some image manipulation before uploading so that I could convert the image to jpeg if the image is in another format (png or bmp for example) or resize the image, so I changed the file to stream for manipulation, I don't want to save it locally again because the code could be used on a website on mobiles, that's why I am saving to stream.
using (MemoryStream ms = new MemoryStream())
{
Image img = Image.FromFile(uploadfileName);
img.Save(ms, ImageFormat.Jpeg);
}
How can I now upload this ms stream to Google Drive?
Thanks for any clue, I'm not an expect in field.

Thanks all for assistance.
The answer suggested by canton7 works:
Just set ms.Position = 0, so that the next read starts reading from the beginning of the stream, then use it in place of your fsSource in your first snippet

Convert PDF to Image Byte array to save to database

What I am trying to accomplish is allowing my user to upload a PDF. I will then convert that to an Image and get the Images byte array. Below is what I have so far.
PdfDocumentProcessor pdfDocumentProcessor = new PdfDocumentProcessor();
using (MemoryStream ms = new MemoryStream(e.UploadedFile.FileBytes))
{
pdfDocumentProcessor.LoadDocument(ms);
for (int i = 1; i <= pdfDocumentProcessor.Document.Pages.Count; i++)
{
Bitmap image = pdfDocumentProcessor.CreateBitmap(i, 1000);
try
{
image.Save(ms, System.Drawing.Imaging.ImageFormat.Bmp);
}
catch (Exception error)
{
string message = error.Message;
}
}
When I try to save the Image to the memory stream I am getting the error "A generic error occurred in GDI+" I believe this has something to do with me not specifying a path for the image to be saved to, but I could be mistaken.
I want to convert the PDF to and Image, then get the byte array of the image, and save that to the database. I really don't want to save the image to a specified path.
PDFDocumentProcessor is a DevExpress class that pulls in the PDF and also will give me the PDF's byte array, but I just can't seem to find a way past the save error to retrieve an Image byte array
Any help is appreciated thank you

The issue is likely caused by you trying to re-use the same MemoryStream that is holding the input file bytes. You should create a new memory stream to save to.
I don't have access to devexpress but I grabbed another Nuget package that i am associated with https://www.nuget.org/packages/Leadtools.Pdf/ and tested it and this code works to save the PDF to a PNG memorystream:
using (var ms = new MemoryStream(fileBytes))
using (var codecs = new RasterCodecs())
{
codecs.Options.Load.AllPages = true;
using (var rasterImage = codecs.Load(ms))
using (var outputStream = new MemoryStream())
codecs.Save(rasterImage, outputStream, RasterImageFormat.Png, 0);
}

C# - Microsoft Graph API with iTextSharp - Cannot access a closed Stream

I am using Microsoft API Graph API to get a PDF file from my OneDrive which I have successfully got via this line:
var streamFile = await graphClient.Me.Drive.Items["{item-id}"].Content.Request().GetAsync();
Now I want to take a the stream of this file and edit it with iTextSharp
using (MemoryStream outFile = new MemoryStream())
{
//Dont know what to replace this with
PdfReader pdfReader = new PdfReader("Uploads/Document.pdf");
PdfStamper pdfStamper = new PdfStamper(pdfReader, outFile);
AcroFields fields = pdfStamper.AcroFields;
fields.SetField("Full_Names", "aaa");
pdfStamper.Close();
pdfReader.Close();
}
And then upload it back to OneDrive, which I am able to do via this:
//Don't know what to replace this with
var uploadPath = System.Web.HttpContext.Current.Server.MapPath("~/Uploads/NewDocument.pdf");
byte[] data = System.IO.File.ReadAllBytes(uploadPath);
Stream stream = new MemoryStream(data);
await graphClient.Me.Drive.Items["{item-id}"].ItemWithPath("NewDocument.pdf").Content.Request().PutAsync<DriveItem>(stream);
So my question is how do I take my file that I got and use iTextSharp to do its thing? So I can upload this new edited file?
UPDATE
I tried this:
var streamFile = await graphClient.Me.Drive.Items["{item-id}"].Content.Request().GetAsync();
using (MemoryStream outFile = new MemoryStream())
{
PdfReader pdfReader = new PdfReader(streamFile);
PdfStamper pdfStamper = new PdfStamper(pdfReader, outFile);
AcroFields fields = pdfStamper.AcroFields;
fields.SetField("Full_Names", "JIMMMMMMAYYYYY");
await graphClient.Me.Drive.Items["{item-id}"].ItemWithPath("NewDocument-2.pdf").Content.Request().PutAsync<DriveItem>(outFile);
pdfStamper.Close();
pdfReader.Close();
}
But got this error:
Cannot access a closed Stream.
I can see the file is being uploaded to my OneDrive, but when I goto open it I get this error:
Failed to load PDF document.
What am I doing wrong here?
UPDATE
When I remove these last two lines:
pdfStamper.Close();
pdfReader.Close();
I don't get Cannot access a closed Stream error anymore, my file uploads but I get an error when I open it:
Failed to load PDF document.
UPDATE
When I try this
var streamFile = await graphClient.Me.Drive.Items["{item-id}"].Content.Request().GetAsync();
await graphClient.Me.Drive.Items["{item-id}"].ItemWithPath("NewDocument-2.pdf").Content.Request().PutAsync<DriveItem>(streamFile);
It uploads the file I grabbed, so that part is working, but I can't edit it with iTextSharp.

See if this helps you along:
Do I need to reset a stream(C#) back to the start?
Once you read a stream, you need to reset it back to the beginning to do something else with it.

Use iTextSharp to save a PDF to a SQL Server 2008 Blob, and read that Blob to save to disk

I'm currently trying to use iTextSharp to do some PDF field mapping, but the challenging part right now is just saving the modified file in a varbinary[max] column. Then I later need to read that blob and convert it into a pdf which I save to a file.
I've been all over looking at example code but I can't find exactly what I'm looking for, and can't seem to piece together the [read from file to iTextSharp object] -> [do my stuff] -> [convert to varbinary(max)] pipeline, nor the conversion of that blob back into a savable file.
If anyone has code snippet examples that would be extremely helpful. Thanks!

The need to deal with a pdf in multiple passes was not immediately clear when I first started working them, so maybe this is some help to you.
In the method below, we create a pdf, render it to a byte[], load it for post processing, render the pdf again and return the result.
The rest of your question deals with getting a byte[] into and out of a varbinary[max], saving a byte[] to file and reading it back out, which you can google easily enough.
public byte[] PdfGeneratorAndPostProcessor()
{
byte[] newPdf;
using (var pdf = new MemoryStream())
using (var doc = new Document(iTextSharp.text.PageSize.A4))
using (PdfWriter.GetInstance(doc, pdf))
{
doc.Open();
// do stuff to the newly created doc...
doc.Close();
newPdf = pdf.GetBuffer();
}
byte[] postProcessedPdf;
var reader = new PdfReader(newPdf);
using (var pdf = new MemoryStream())
using (var stamper = new PdfStamper(reader, pdf))
{
var pageCount = reader.NumberOfPages;
for (var i = 1; i <= pageCount; i++)
{
// do something on each page of the existing pdf
}
stamper.Close();
postProcessedPdf = pdf.GetBuffer();
}
reader.Close();
return postProcessedPdf;
}

C#: Getting Error: "A generic error occurred in GDI+" When trying to copy imagebox.Image into memory stream. This stream is not being saved to file.

I am getting an odd error with Image.save(Stream, Format).
I tried looking around on here for a solution but everyone seems to think the error is from file permissions. That can't be it in my case case the Stream isn't going into a file. My code is below:
MemoryStream Stream = new MemoryStream();
this.Image_Box_1.Image.Save(Stream, System.Drawing.Imaging.ImageFormat.Jpeg);
TI.AlbumCover = Stream.ToArray();
Stream.Close();
TI.AlbumCover is a byte[].
Does anyone have any ideas on what the problem might be?
EDIT:
Ok, so I worked it out. The original file could sometimes come from a jpg file, sometimes from a byte array (part of an id3 tag). The problem was that when the image came from the file, I was closing the stream after creating the image box image. While the image remained visible, the data was no longer available.
Since I also later needed to overwrite that jpg file, I could not simply leave the filestream for it open so I left the rest of my code the same and changed the code to read from the jpg to the following:
FileStream FS = new FileStream(File, FileMode.Open, FileAccess.Read);//Read in the jpg file
Image IMG = Image.FromStream(FS);//Create an image from the file data
MemoryStream MS = new MemoryStream();
IMG.Save(MS, System.Drawing.Imaging.ImageFormat.Jpeg);//Save the image data to a memory stream
byte[] temp = MS.ToArray();//Copy the image data to a byte array
//close the streams
MS.Close();
FS.Close();
return temp; //was originally returning an image
Then after executing this code I change the code that placed the image into the image box to:
try
{
if (this.m_V2Tag.AlbumCover != null)
this.Image_Box_1.Image = Image.FromStream(new MemoryStream(this.m_V2Tag.AlbumCover));
//changed code
else
{
MemoryStream temp = new MemoryStream(this.getFolderJpg()); //create a memory stream from the byte[]. This stream can safely be left open.
this.Image_Box_1.Image = Image.FromStream(temp); // create image and assign it to the image box
}
}
catch
{
this.Image_Box_1.Image = null;
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Text missing when converting a PDF to PNG using Magick.NET - c#

Related

Google drive upload an image from memory stream

Convert PDF to Image Byte array to save to database

C# - Microsoft Graph API with iTextSharp - Cannot access a closed Stream

Use iTextSharp to save a PDF to a SQL Server 2008 Blob, and read that Blob to save to disk

C#: Getting Error: "A generic error occurred in GDI+" When trying to copy imagebox.Image into memory stream. This stream is not being saved to file.

Categories

Resources