Error with ImageMagick and Ghostscript converting from PDF

Error with ImageMagick and Ghostscript converting from PDF - c#

We are using Magick.Net version 7.0 with Ghostscript 9.16. We are reading in a PDF and converting this to a tif or a jpg image. Everything is working fine when we run these through one at a time and our PDF gets converted.
This is an application that will be hit by many systems, so we put a small load test to ensure we could handle multiple requests. Everything runs great as long as we use different PDF files. If we try and run the same PDF file through multiple times (doing 5 requests at the same time with the same PDF), we encounter and error. The error we receive is PDFDelegateFailed. We are not sure why this error occurs and if we try other formats (such as tif to jpg), there are no issues.
ImageMagick.MagickDelegateErrorException:
ESBService.exe:
PDFDelegateFailed [ghostscript library 9.16] -q -dQUIET -dSAFER
-dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r96x96" "-sOutputFile=C:/Users/esbsvc/AppData/Local/Temp/magick-4668LPfdzdzRfLYF%d"
"-fC:/Users/esbsvc/AppData/Local/Temp/magick-4668wanF98SE_8PK"
"-fC:/Users/esbsvc/AppData/Local/Temp/magick-4668L3mJE6M2iUZV":
(null)' # error/pdf.c/ReadPDFImage/788 at
ImageMagick.Wrapper.MagickImageCollection.HandleException(MagickException
exception) at ImageMagick.Wrapper.MagickImageCollection.Read(Byte[]
data, MagickReadSettings readSettings) at
ImageMagick.MagickImageCollection.Read(Byte[] data, MagickReadSettings
readSettings) at
__DynamicCode.Typeaeb039071464a22ae6518eaa5ec46c.OnExecute(PipelineContext1
context) in c:\Users\esbsvc\AppData\Local\Temp\xp42eval.0.cs:line 112
Any help with this would be appreciated
Mike H.

There are two likely problems:
1) The C# code is using a single copy of the Ghostscript DLL and you haven't built it to be thread safe (I cannot recall what the default is currently on Windows). In effect you are running multiple threads rather than processes.
2) You have a collision on file access. In order to interpret a PDF file it is necessary to jump around the file a lot, I would guess that two processes tried to relocate the file pointer simultaneously and one failed.
ImageMagick can't handle PDF files directly, unlike image formats (PDF is not an image format, its much more complex), so it will not need to invoke Ghostscript. If you were to try the same with PostScript files you might encounter the same problem. However, since PostScript files are read linearly you may not have a problem with those.
If you capture the Ghostscript back channel output (and stop using -dQUIET) then you might get some more useful information.
Since you say this 'will be hit by many systems' please check the terms of the AGPL and ensure that your usage is consistent with the licence.

The API documentation of Ghostscript (http://www.ghostscript.com/doc/current/API.htm) states the following:
The Win32 DLL gsdll32.dll can be used by multiple programs simultaneously, but only once within each process.
The version of Magick.NET that you are using does not handle this properly. I just pushed a patch to GIT repository of ImageMagick to make sure the DLL can be used only once. The first thread will use the library in memory and the second/third/etc. thread will forced to use the command line. Magick.NET 7.0.0.0022 has just been published and includes this change.

Related

Read OLE object with C#

I have an old application made with Visual Basic that can upload and download documents as OLE objects. That is retrieved to a SQL Server database, which stores them in a varbinary(max) field. However, those bins don't have the same format as regular files, as OLE structured them in its own fashion.
I want to mass download all those docs with a .NET C# app that uses SQL, but I can't find a way to do it. I have tried to copy the binary data in a new file using SqlDataReader, MemoryStream and FileStream, but they interpreted the information straightforward and not in the way OLE structured them. So, the resulting files were corrupted.
Is there a class that can interpret properly this OLE binaries? The old app used an OLE Container component, but they don't exist since a few years.

This is an old post that never got answered. I am in a similar situation myself in that I need to take images and convert them into OLE Files so that Microsoft Access OLE Bound Frames can read them. I have been researching this for days and have just now discovered the Binary Format that I need i.e. Compound File... or Structured File. They have this Nuget Package available which should accomplish what we both need available here: OpenMCDF
I am still in the process of trying to figure out how to create a compound file from Raw Binary... but feel I am close and will post the resolution when I get it.
UPDATE: The project that I was working on was not supposed to be so involved. Thus I didn't want to waste anymore time with it after already trying different recommendations for a couple days. Thus I ended up creating a complete hack that I never hope to have to repeat. Here are the steps that I used:
I set up a form with a maximized window to display each of the images one by one via a timer interval.
Between each interval I took programmatic screen shots (Print Screen) of the images, saving them in a separate folder using an identifiable naming convention.
That portion was done all in VBA.
Once I had full screen shots in a folder, I then used PhotoShop to then crop all the images in a batch to the exact image sizes.
A hack yes, but it accomplished what I needed... what a pain!

what will be the result when more than one user simultaneously access the ghostscript pdf to image converter

I 'm using ghostscript pdf to image conversion
'using Cyotek.GhostScript.PdfConversion;'
'using Cyotek.GhostScript;'
in my webproject it works well when processing single request at a time, but when it process more than one request simultaneously it produce error like 'Failed to process GhostScript command.' my project is a web project here more than one process the website how can i solve this problem i'm helpless please help me.

If you are not using Ghostscript via command line (Process.Start), unless native Ghostscript library has been compiled with the GS_THREADSAFE define, only one instance at a time (per process) is supported. This means you can process only 1 pdf at a time. I believe Cyotek.GhostScript uses Ghostscript API and your native Ghostscript library is compiled without the GS_THREADSAFE.
Eventually you could try to use Ghostscript.NET which has ability to load native Ghostscript library from the memory. That way you can have multiple instances of the native Ghostscript library running at a same time each one in it's own context within a same process. (without a need for GS_THREADSAFE).

Convert docx to postscript

I need to convert a Word document (docx) to a postscript file so that I can use this postscript file to generate PDF using the Ghostscript command line tool.
How do I generate the postscript file from the docx?
I need to code using .NET/C#. I found about LaTeX which generates postscript but how do I make my Word file be used with LaTeX or any other tool to get the postscript generated?

There are three main products I will mention that understand DOCX.
The obvious one is MS Word. It produces the definitive rendering of all DOCX files. Nothing is ever going to be exactly the same. By definition it is always correct. However it is not really designed for automated conversion and getting it to do this kind of thing is fraught with difficulty. On a legal level the EULA may confict with your chosen solution.
OpenOffice.org is a great product. The EULA is much more accomodating. The freeness is attractive. However, while it will produce a pretty good output for most DOCX documents it does not for all. While it is similar to MS Word it is not the same and this is something you may notice, particularly for more complex documents. Probably more importantly, again it's not designed for automated conversions and trying to get it to do this can be fraught and tiresome.
WordGlue .NET (on which I work) is a native .NET library that understands DOCX. It is designed specifically to produce output which is the same as MS Word. While I'm not going to say it is perfect (it's a big task) it is superior to OpenOffice.org in that it does actually attempt this as a specfic design decision. However probably the biggest advantage is that it is designed for high perfomance multi-threaded server side conversion. It's native .NET and thus low impact in terms of security.
Products like ABCpdf (on which I work) will integrate with these three applicatons to allow conversion direct to PDF. Why bother going via PostScript if you want PDF? However if you really want to save as PostScript you can do that too.
Or indeed you can write your own code to integrate with these products. Just be aware of the caveats above regarding fraughtness and tiresomeness relating to MS Office and OpenOffice.org. To get these things working unattended requires an awful lot of attention.

You need to print it to a PostScript file, from an application which can read .docx files. Or you could just export direct to PDf from the app, as far as I know anything which reads .docx and can print, can also write a PDF file.

If you have a windows computer you can use the commandline
"%ProgramFiles%\Windows NT\Accessories\wordpad.exe" /pt foobaar.docx "printerThatDumpsPS"
You can find file printers for postscript printing for free on the internet. Or if you have adobe pfdf, pdf exchange or any PS printer. You can use c# to temporarily set the printers settings so that it does this for you.
So for example using pdf exchange as follows,
"%ProgramFiles%\Windows NT\Accessories\wordpad.exe" /pt foobaar.docx "PDF-XChange Printer 2012"
Produces a pdf file without much of a trace anywhere what program was used, assuming pdf exchange was set to save file without asking.
This produces a passable document but yeah it looses quiet many features. But it might be enough.

Edit tif files with C#

I need to create a program that reads tif files from a directory and then trims the bottom inch of the file and resaves the file. I know how to open the files but how would I automate this process from c#?

If you need to handle TIFF images in C# then have a look at LibTIFF.Net
http://bitmiracle.com/libtiff/ - It is open source and Native .NET component and free for commercial use.
This library should also have the TIFF cropping functions you need. I am not sure if the native .NET libraries can handle all of the TIFF functions you may require whereas LibTIFF will.
The original LibTIFF for C/C++ can be found at http://www.remotesensing.org/libtiff/ which may help you with documentation and support if needed.
Included with libTiff is a program called tiffCrop which should also have source code. http://www.remotesensing.org/libtiff/man/tiffcrop.1.html which can be accessed via
http://www.remotesensing.org/libtiff/tools.html.

See here.

ASP.Net Converting and Merging documents into single PDF

I need to have the ability to convert and merge various documents into a single Pdf.
The documents could be of varying types, such as Word, Open Office, Images, Text, Web pages (by URL) and the PDF would usually consist of 2-3 documents.
At the moment, we are using BCL Technologies easyPDF with Microsoft Office installed onto the Server. This handles most documents but we haven't had it doing Open Office ones yet.
We currently produce around 100-1000 of these PDF's per day.
The reason I am asking the question is that performance is a key issue. The PDF is generated for users on the fly and so the waiting times we are currently getting of 30-60 seconds is becoming unacceptable.
We have done some caching around documents when they are intially uploaded so the main tasks that happens when a User requests a Pdf is merging a number of already generated Pdf's.
Does anyone else have any other tools they have used that work reliably for most common document types and above all, quickly? When put like that, it seems like I'm asking a lot!
Edit:
Thanks for all the great advice, I'll look into some of these and compare performance.
Just to add to all this, money is not really an object. We're more than happy to pay for different applications to perform each task as well as looking into various hardware options to distribute the load as much as possible.

Merging multiple PDF documents is normally simple enough (as long as they don't need to be merged on the same page) - you could compare your merge performance with something like iTextSharp (.NET version of iText) to be sure it isn't a bottleneck - otherwise the conversion from other formats to PDF is likely the bottleneck.
In almost all cases, the method used to convert X to PDF is to execute the applications print command, targeted at a software PDF printer, to create a temporary PDF file.
This means:
The target application (for example Office) is opened and closed
The document has to travel through the printing service
In your situation, are you converting arbitrary documents submitted by the users, or do the documents come from a stored library of files? If it's a library, you could make a PDF copy of each file as it is added to the library (instead of when the user makes a request), and then only merge the PDF files.

We use ABC Pdf. I don't know if it will be fast enough for your needs, but it seems to work for our use.

I had a very similar issue where we had documents that were already existing in PDF format and needed to allow the user to see them all combined together. We purchased the PDF4NET product which was about $500 from what I recall. It was extremely easy to use and they provide awesome examples of how to use the tools.
O2 Solutions - PDF4NET
Here is the code sample that they provide for merging. The top line looks like it just outputs the file, the second 2 lines allow for streaming the content back to the user.
PDFFile.MergeFilesToDisk( "append.pdf", "unicode.pdf", "multicolumntextandimages.pdf" );
PDFDocument doc = PDFFile.MergeFilesToDoc( "append.pdf", "unicode.pdf", "multicolumntextandimages.pdf" );
doc.SaveToStream( stream );

You say you're using Microsoft Office to open these files, I would imagine this is the bottleneck rather than the actual PDF creation.
Is it possible to distill these documents into a more accessible format (html/xml/database), so that it's not necessary to open office every time a PDF needs to be created?

While I have no PDF conversion suggestions I can say that this problem sounds like one which could be distributed over a number of nodes. Do you find that the PDF generation is CPU-bound or are there other limiting factors? Before expending too much effort on rewriting the PDF library interface you might want to see what the bottlenecks are.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.