Convert PDF to Image Batch - c#

I am working on a solution where I can convert pdf files to images.
I am using the following example from codeproject:
http://www.codeproject.com/Articles/317700/Convert-a-PDF-into-a-series-of-images-using-Csharp?msg=4134859#xx4134859xx
now I tried with the following code to generate from more then 1000 pdf files new images:
using Cyotek.GhostScript;
using Cyotek.GhostScript.PdfConversion;
using System;
using System.Collections.Generic;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace RefClass_PDF2Image
{
class Program
{
static void Main(string[] args)
{
string outputPath = Properties.Settings.Default.outputPath;
string pdfPath = Properties.Settings.Default.pdfPath;
if (!Directory.Exists(outputPath))
{
Console.WriteLine("Der angegebene Pfad " + outputPath + " für den Export wurde nicht gefunden. Bitte ändern Sie den Pfad (outputPath) in der App.Config Datei.");
return;
}
else
{
Console.WriteLine("Output Pfad: " + outputPath + " gefunden.");
}
if (!Directory.Exists(pdfPath))
{
Console.WriteLine("Der angegebene Pfad " + pdfPath + " zu den PDF Zeichnungen wurde nicht gefunden. Bitte ändern Sie den Pfad (pdfPath) in der App.Config Datei.");
return;
}
else
{
Console.WriteLine("PDF Pfad: " + pdfPath + " gefunden.");
}
Pdf2ImageSettings settings = GetPDFSettings();
DateTime start = DateTime.Now;
TimeSpan span;
Console.WriteLine("");
Console.WriteLine("Extraktion der PDF Zeichnungen wird gestartet: " + start.ToShortTimeString());
Console.WriteLine("");
DirectoryInfo diretoryInfo = new DirectoryInfo(pdfPath);
DirectoryInfo[] directories = diretoryInfo.GetDirectories();
Console.WriteLine("");
Console.WriteLine("Es wurden " + directories.Length + " verschiedende Verzeichnisse gefunden.");
Console.WriteLine("");
List<string> filenamesPDF = Directory.GetFiles(pdfPath, "*.pdf*", SearchOption.AllDirectories).Select(x => Path.GetFullPath(x)).ToList();
List<string> filenamesOutput = Directory.GetFiles(outputPath, "*.*", SearchOption.AllDirectories).Select(x => Path.GetFullPath(x)).ToList();
Console.WriteLine("");
Console.WriteLine("Es wurden " + filenamesPDF.Count + " verschiedende PDF Zeichnungen gefunden.");
Console.WriteLine("");
List<string> newFileNames = new List<string>();
int cutLength = pdfPath.Length;
for (int i = 0; i < filenamesPDF.Count; i++)
{
string temp = filenamesPDF[i].Remove(0, cutLength);
temp = outputPath + temp;
temp = temp.Replace("pdf", "jpg");
newFileNames.Add(temp);
}
for (int i = 0; i < filenamesPDF.Count; i++)
{
FileInfo fi = new FileInfo(newFileNames[i]);
if (!fi.Exists)
{
if (!Directory.Exists(fi.DirectoryName))
{
Directory.CreateDirectory(fi.DirectoryName);
}
Bitmap firstPage = new Pdf2Image(filenamesPDF[i], settings).GetImage();
firstPage.Save(newFileNames[i], System.Drawing.Imaging.ImageFormat.Jpeg);
firstPage.Dispose();
}
//if (i % 20 == 0)
//{
// GC.Collect();
// GC.WaitForPendingFinalizers();
//}
}
Console.ReadLine();
}
private static Pdf2ImageSettings GetPDFSettings()
{
Pdf2ImageSettings settings;
settings = new Pdf2ImageSettings();
settings.AntiAliasMode = AntiAliasMode.Medium;
settings.Dpi = 150;
settings.GridFitMode = GridFitMode.Topological;
settings.ImageFormat = ImageFormat.Png24;
settings.TrimMode = PdfTrimMode.CropBox;
return settings;
}
}
}
unfortunately, I always get in the Pdf2Image.cs an out of memory exception. here the code:
public Bitmap GetImage(int pageNumber)
{
Bitmap result;
string workFile;
//if (pageNumber < 1 || pageNumber > this.PageCount)
// throw new ArgumentException("Page number is out of bounds", "pageNumber");
if (pageNumber < 1)
throw new ArgumentException("Page number is out of bounds", "pageNumber");
workFile = Path.GetTempFileName();
try
{
this.ConvertPdfPageToImage(workFile, pageNumber);
using (FileStream stream = new FileStream(workFile, FileMode.Open, FileAccess.Read))
{
result = new Bitmap(stream); // --->>> here is the out of memory exception
stream.Close();
stream.Dispose();
}
}
finally
{
File.Delete(workFile);
}
return result;
}
how can I fix that to avoid this exception?
thanks for any help,
tro

Don't know if this is worth it for you, but it appears that you can do what you want without having a Bitmap in the middle. PdfToImage has this code in it:
public void ConvertPdfPageToImage(string outputFileName, int pageNumber)
{
if (pageNumber < 1 || pageNumber > this.PageCount)
throw new ArgumentException("Page number is out of bounds", "pageNumber");
using (GhostScriptAPI api = new GhostScriptAPI())
api.Execute(this.GetConversionArguments(this._pdfFileName, outputFileName, pageNumber, this.PdfPassword, this.Settings));
}
which writes a file for you where you want it. Why not just call that method directly instead of reading the image back in and writing it back out?

This might not be answering your question directly, but could still be useful: Imagemagick provides a simple way of creating images from pdfs in batch mode
Single pdf file to many jogs:
convert -geometry 1024x768 -density 200 -colorspace RGB test.pdf +adjoin test_%0d.jpg
or if you want to process many pdf files:
mogrify -format jpg -alpha off -density 150 -quality 80 -resize 768 -unsharp 1.5 *.pdf
(The settings should obviously be adapted to your needs :) )
To do this programmatically in C# you could use the .NET imagemagick wrapper
http://imagemagick.codeplex.com

Add using for your resulted bitmap
using (FileStream stream = new FileStream(workFile, FileMode.Open, FileAccess.Read))
using (Bitmap result = new Bitmap(stream))
{
...
}

Related

Using C# and MediaInfo How to detect HDR format

I'm using the MediaInfo Nuget Wrapper to analyse a bunch of files
using MediaInfo Actual on a file I can see
Video
ID : 1
Format : HEVC
Format/Info : High Efficiency Video Coding
Format profile : Main 10#L5.1#Main
HDR format : Dolby Vision, Version 1.0, dvhe.05.09, BL+RPU
Codec ID : dvhe
Codec ID/Info : High Efficiency Video Coding with Dolby Vision
This is also seen with Console.WriteLine(mw1.Inform());
However I'm unable to get that from the code below
I've tried HDR format, HDRformat and other speling but always returns ""
Given the fact that every file will be different is there a more dynamic way of doing this rather than hard coding each property?
Code still at testing stage
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using MediaInfo;
using MediaInfo.Model;
namespace GetMediaInfo
{
class Program
{
static void Main(string[] args)
{
string BaseFold = #"Path\To\Test\Samples";
string[] Files = Directory.GetFiles(BaseFold, "*.*", SearchOption.AllDirectories);
foreach (var Vid in Files)
{
string VidName = Path.GetFileName(Vid);
if (VidName.EndsWith("jpg"))
{
continue;
}
Console.WriteLine(VidName);
var mw1 = new MediaInfo.MediaInfo();
mw1.Option("ParseSpeed", "0");
mw1.Open(Vid);
string ToDisplay = "";
var videostreamcount = mw1.CountGet(StreamKind.Video, 0);
var AudioStreamcount = mw1.CountGet(StreamKind.Audio, 0);
if (videostreamcount > 0)
{
Console.WriteLine(mw1.Inform());
foreach (var item in mw1.Get(StreamKind.Video,0,"*"))
{
Console.WriteLine(item);
}
var Height = mw1.Get(StreamKind.Video, 0, "Height");
var Width = mw1.Get(StreamKind.Video, 0, "Width");
var VidFormat = mw1.Get(StreamKind.Video, 0, "Format");
var HDRformat = mw1.Get(StreamKind.Video, 0, "HDR format"); // Always = ""
var Codec = mw1.Get(StreamKind.Video, 0, "CodecID/Info");
var CodecID = mw1.Get(StreamKind.Video, 0, "CodecID");
Console.WriteLine("Height " + Height + ", Width " + Width + ", Codec " + Codec + ", CodecID " + CodecID + ", Format " + VidFormat + " , HDR format " + HDRformat);
ToDisplay += "\r\n\r\nInfo_Parameters\r\n";
ToDisplay += mw1.Option("Info_Parameters");
//ToDisplay += "\r\n\r\nInfo_Capacities\r\n";
//ToDisplay += mw1.Option("Info_Capacities");
//ToDisplay += "\r\n\r\nInfo_Codecs\r\n";
//ToDisplay += mw1.Option("Info_Codecs");
// Console.WriteLine(ToDisplay);
}
else
{
Console.WriteLine("Error No video streams in file");
}
if (AudioStreamcount > 0)
{
var AudioCodec = mw1.Get(StreamKind.Audio, 0, "CodecID/Info");
var AudioCodecID = mw1.Get(StreamKind.Audio, 0, "CodecID");
var AudioFormat = mw1.Get(StreamKind.Audio, 0, "Format");
Console.WriteLine("AudioCodec: {0}, AudioCodecID: {1}, AudioFormat {2}", AudioCodec, AudioCodecID, AudioFormat);
}
else
{
Console.WriteLine("Error No Audio streams in file");
}
}
Console.ReadLine();
}
}
}
Thanx
I've tried HDR format, HDRformat and other speling but always returns ""
HDR_Format
Tip: for knowing the keys to use, use MediaInfo command line with " --Language=raw" or graphical interface with XML output.
Jérôme, developer of MediaInfo

Downloading a directory using SSH.NET SFTP in C#

I am using Renci.SSH and C# to connect to my Unix server from a Windows machine. My code works as expected when the directory contents are only files, but if the directory contains a folder, I get this
Renci.SshNet.Common.SshException: 'Failure'
This is my code, how can I update this to also download a directory (if exists)
private static void DownloadFile(string arc, string username, string password)
{
string fullpath;
string fp;
var options = new ProgressBarOptions
{
ProgressCharacter = '.',
ProgressBarOnBottom = true
};
using (var sftp = new SftpClient(Host, username, password))
{
sftp.Connect();
fp = RemoteDir + "/" + arc;
if (sftp.Exists(fp))
fullpath = fp;
else
fullpath = SecondaryRemoteDir + d + "/" + arc;
if (sftp.Exists(fullpath))
{
var files = sftp.ListDirectory(fullpath);
foreach (var file in files)
{
if (file.Name.ToLower().Substring(0, 1) != ".")
{
Console.WriteLine("Downloading file from the server...");
Console.WriteLine();
using (var pbar = new ProgressBar(100, "Downloading " + file.Name + "....", options))
{
SftpFileAttributes att = sftp.GetAttributes(fullpath + "/" + file.Name);
var fileSize = att.Size;
var ms = new MemoryStream();
IAsyncResult asyncr = sftp.BeginDownloadFile(fullpath + "/" + file.Name, ms);
SftpDownloadAsyncResult sftpAsyncr = (SftpDownloadAsyncResult)asyncr;
int lastpct = 0;
while (!sftpAsyncr.IsCompleted)
{
int pct = (int)((long)sftpAsyncr.DownloadedBytes / fileSize) * 100;
if (pct > lastpct)
for (int i = 1; i < pct - lastpct; i++)
pbar.Tick();
}
sftp.EndDownloadFile(asyncr);
Console.WriteLine("Writing File to disk...");
Console.WriteLine();
string localFilePath = "C:\" + file.Name;
var fs = new FileStream(localFilePath, FileMode.Create, FileAccess.Write);
ms.WriteTo(fs);
fs.Close();
ms.Close();
}
}
}
}
else
{
Console.WriteLine("The arc " + arc + " does not exist");
Console.WriteLine();
Console.WriteLine("Please press any key to close this window");
Console.ReadKey();
}
}
}
BeginDownloadFile downloads a file. You cannot use it to download a folder. For that you need to download contained files one by one.
The following example uses synchronous download (DownloadFile instead of BeginDownloadFile) for simplicity. After all, you are synchronously waiting for asynchronous download to complete anyway. To implement a progress bar with synchronous download, see Displaying progress of file download in a ProgressBar with SSH.NET.
public static void DownloadDirectory(
SftpClient sftpClient, string sourceRemotePath, string destLocalPath)
{
Directory.CreateDirectory(destLocalPath);
IEnumerable<SftpFile> files = sftpClient.ListDirectory(sourceRemotePath);
foreach (SftpFile file in files)
{
if ((file.Name != ".") && (file.Name != ".."))
{
string sourceFilePath = sourceRemotePath + "/" + file.Name;
string destFilePath = Path.Combine(destLocalPath, file.Name);
if (file.IsDirectory)
{
DownloadDirectory(sftpClient, sourceFilePath, destFilePath);
}
else
{
using (Stream fileStream = File.Create(destFilePath))
{
sftpClient.DownloadFile(sourceFilePath, fileStream);
}
}
}
}
}

C# System.IO.IOException

I have following code:
using System;
using System.Collections.Generic;
using System.IO;
using VirusTotalNET;
using VirusTotalNET.Objects;
using System.Linq;
using System.Security.Permissions;
namespace VirusTotalNETClient
{
class Program
{
private const string ScanUrl = "http://www.google.com/";
static void Main(string[] args)
{
VirusTotal virusTotal = new VirusTotal("5d8684f50946c2bdeaf5c4fd966f61f3661de808e9d7324b99788d6f4fb7ad57");
//Use HTTPS instead of HTTP
virusTotal.UseTLS = true;
//creating folder for programs reliqies and output log
string folderName = "C:\\OnlineScanner";
System.IO.Directory.CreateDirectory(folderName);
//get list of files to analyse
var paths = Traverse("C:\test");
File.WriteAllLines("C:\\OnlineScanner\\test.txt", paths);
foreach (string line in File.ReadLines("C:\\test.txt"))
{
//Define what file you want to analyse
FileInfo fileInfo = new FileInfo(line);
//Check if the file has been scanned before.
FileReport fileReport = virusTotal.GetFileReport(fileInfo);
bool hasFileBeenScannedBefore = fileReport.ResponseCode == ReportResponseCode.Present;
//If the file has been scanned before, the results are embedded inside the report.
if (hasFileBeenScannedBefore)
{
int detekce = fileReport.Positives;
if (detekce >= 1)
{
using (var writer = new StreamWriter("C:\\OnlineScanner\\OnlineScannerLog.txt"))
{
writer.WriteLine(line);
writer.WriteLine("URL to test: " + fileReport.Permalink);
writer.WriteLine("Detect ratio: " + fileReport.Positives + "/54");
writer.WriteLine("Message: " + fileReport.VerboseMsg);
writer.WriteLine();
writer.WriteLine();
}
}
System.Threading.Thread.Sleep(16000);
}
else
{
ScanResult fileResult = virusTotal.ScanFile(fileInfo);
int detekce = fileReport.Positives;
if (detekce >= 1)
{
using (var writer = new StreamWriter("C:\\OnlineScanner\\OnlineScannerLog.txt"))
{
writer.WriteLine(line);
writer.WriteLine("URL to test: " + fileReport.Permalink);
writer.WriteLine("Detect ratio: " + fileReport.Positives + "/54");
writer.WriteLine("Message: " + fileReport.VerboseMsg);
writer.WriteLine();
writer.WriteLine();
}
}
System.Threading.Thread.Sleep(16000);
}
}
}
private static IEnumerable<string> Traverse(string rootDirectory)
{
IEnumerable<string> files = Enumerable.Empty<string>();
IEnumerable<string> directories = Enumerable.Empty<string>();
try
{
// The test for UnauthorizedAccessException.
var permission = new FileIOPermission(FileIOPermissionAccess.PathDiscovery, rootDirectory);
permission.Demand();
files = Directory.GetFiles(rootDirectory);
directories = Directory.GetDirectories(rootDirectory);
}
catch
{
// Ignore folder (access denied).
rootDirectory = null;
}
foreach (var file in files)
{
yield return file;
}
// Recursive call for SelectMany.
var subdirectoryItems = directories.SelectMany(Traverse);
foreach (var result in subdirectoryItems)
{
yield return result;
}
}
}
}
This code run some time (arround 15secs) but then program crashs.
The error is
System.IO.IOException, process can't access to file C:\hiberfil.sys.
http://upnisito.cz/images/2016_12/319crasherrror.png
Do you have any idea how to solve it?

SevenZipSharp, how to read txt file?

I am trying to read txt files from .7z archive
using (StreamReader reader = new StreamReader(f + "//" + file.FileName))
but I get this error:
An unhandled exception of type 'System.IO.DirectoryNotFoundException' occurred in mscorlib.dll
Additional information: Could not find a part of the path 'E:\1.7z\1\2\3\New Text Document.txt'.
if (IntPtr.Size == 8) //x64
{
SevenZip.SevenZipExtractor.SetLibraryPath(#"C:\Program Files\7-Zip\7z.dll");
}
else //x86
{
SevenZip.SevenZipCompressor.SetLibraryPath(#"C:\Program Files (x86)\7-Zip\7z.dll");
}
string f = "E://1.7z";
SevenZipExtractor extractor = new SevenZipExtractor(f);
foreach (ArchiveFileInfo file in extractor.ArchiveFileData)
{
// Console.WriteLine("{0} : {1} Bytes", file.FileName, file.Size);
if (file.FileName.EndsWith(".txt", StringComparison.OrdinalIgnoreCase))
{
using (StreamReader reader = new StreamReader(f + "//" + file.FileName))
{
while (reader.Peek() >= 0)
{
Console.WriteLine("{0} ", reader.ReadLine());
}
}
}
}
To use SevenZipExtractor to extract a file, use the following:
String file = #"\\yourdirectory\\yourzipfile.zip";
String directoryToExtract = #"\\yourdirectorytoextract"
using (SevenZip.SevenZipExtractor extr = new SevenZip.SevenZipExtractor(file))
{
Console.WriteLine("Extracting File...");
extr.ExtractArchive(directoryToExtract);
System.IO.File.Delete(file);
}

Random "Parameter is not valid" error with bitmap

The method below takes in some FileInfo of a certain pdf file and will then proceed to convert that pdf into bitmaps. This method works randomly it seems. Sometimes it will finish with no problems, other times it will fail at the using (Bitmap bmp = image.ToBitmap()) part. Once it hits that line I get a "Parameter is not valid" error. I have no clue how to fix this random error nor dissect it even further. Any help would be appreciated
static void ParseOutEachBitmap(FileInfo[] pdfFiles)
{
string BmpPath = "C:\\temp\\bmps\\";
if (!Directory.Exists(BmpPath))
{
Directory.CreateDirectory(BmpPath);
}
using (MagickImageCollection images = new MagickImageCollection())
{
MagickReadSettings settings = new MagickReadSettings();
settings.Density = new MagickGeometry(300, 300);
for (int p = 0; p < pdfFiles.Count(); p++)
{
images.Read(#"c:\temp\pdfs\" + pdfFiles[p].Name, settings);
int pageNumber = 1;
string pdfName = pdfFiles[p].Name;
foreach (MagickImage image in images)
{
using (Bitmap bmp = image.ToBitmap())
{
Console.WriteLine("PDF Filename: " + pdfName);
Console.WriteLine("Page Number: " + pageNumber + " of " + images.Count);
pageNumber++;
using (tessnet2.Tesseract tessocr = new tessnet2.Tesseract())
{
tessocr.GetThresholdedImage(bmp, System.Drawing.Rectangle.Empty).Save("c:\\temp\\bmps\\" + Guid.NewGuid().ToString() + ".bmp");
}
}
}
}
}
}

Categories

Resources