Convert PDF to TIFF using ImageMagick & C#

Convert PDF to TIFF using ImageMagick & C# - c#

I have an existing program that does some processing a .pdf file and splitting it into multiple .pdf files based on looking for barcodes on the pages.
The program uses ImageMagick and C#.
I want to change it from outputting pdfs to outputting tifs. Look for the comment in the code below for where I would guess the change would be made.
I included the ImageMagick tag because someone might offer a commandline option that someone else can help me convert to C#.
private void BurstPdf(string bigPdfName, string targetfolder)
{
bool outputPdf = true; // change to false to output tif.
string outputExtension = "";
var settings = new MagickReadSettings { Density = new Density(200) };
string barcodePng = Path.Combine("C:\TEMP", "tmp.png");
using (MagickImageCollection pdfPageCollection = new MagickImageCollection())
{
pdfPageCollection.Read(bigPdfName, settings);
int inputPageCount = 0;
int outputPageCount = 0;
int outputFileCount = 0;
MagickImageCollection resultCollection = new MagickImageCollection();
string barcode = "";
string resultName = "";
IBarcodeReader reader = new BarcodeReader();
reader.Options.PossibleFormats = new List<BarcodeFormat>();
reader.Options.PossibleFormats.Add(BarcodeFormat.CODE_39);
reader.Options.TryHarder = false;
foreach (MagickImage pdfPage in pdfPageCollection)
{
MagickGeometry barcodeArea = getBarCodeArea(pdfPage);
IMagickImage barcodeImg = pdfPage.Clone();
barcodeImg.ColorType = ColorType.Bilevel;
barcodeImg.Depth = 1;
barcodeImg.Alpha(AlphaOption.Off);
barcodeImg.Crop(barcodeArea);
barcodeImg.Write(barcodePng);
inputPageCount++;
using (var barcodeBitmap = new Bitmap(barcodePng))
{
var result = reader.Decode(barcodeBitmap);
if (result != null)
{
// found a first page because it has bar code.
if (result.BarcodeFormat.ToString() == "CODE_39")
{
if (outputFileCount != 0)
{
// write out previous pages.
if (outputPdf) {
outputExtension = ".pdf";
} else {
// What do I put here to output a g4 compressed tif?
outputExtension = ".tif";
}
resultName = string.Format("{0:D4}", outputFileCount) + "-" + outputPageCount.ToString() + "-" + barcode + outputExtension;
resultCollection.Write(Path.Combine(targetfolder, resultName));
resultCollection = new MagickImageCollection();
}
barcode = standardizePhysicalBarCode(result.Text);
outputFileCount++;
resultCollection.Add(pdfPage);
outputPageCount = 1;
}
else
{
Console.WriteLine("WARNING barcode is not of type CODE_39 so something is wrong. check page " + inputPageCount + " of " + bigPdfName);
if (inputPageCount == 1)
{
throw new Exception("barcode not found on page 1. see " + barcodePng);
}
resultCollection.Add(pdfPage);
outputPageCount++;
}
}
else
{
if (inputPageCount == 1)
{
throw new Exception("barcode not found on page 1. see " + barcodePng);
}
resultCollection.Add(pdfPage);
outputPageCount++;
}
}
if (File.Exists(barcodePng))
{
File.Delete(barcodePng);
}
}
if (resultCollection.Count > 0)
{
if (outputPdf) {
outputExtension = ".pdf";
} else {
// What do I put here to output a g4 compressed tif?
outputExtension = ".tif";
}
resultName = string.Format("{0:D4}", outputFileCount) + "-" + outputPageCount.ToString() + "-" + barcode + outputExtension;
resultCollection.Write(Path.Combine(targetfolder, resultName));
outputFileCount++;
}
}
}
[EDIT] The above code is what I am using (which some untested modifications) to split a .pdf into other .pdfs. I want to know how to modify this code to output tiffs. I put a comment in the code where I think the change would go.
[EDIT] So encouraged by #fmw42 I just ran the code with the .tif extension enabled. Looks like it did convert to a .tif, but the tif is not compressed. I am surprised that IM just configures the output based on the extension name of the file. Handy I guess, but just seems a little loose.
[EDIT] I figured it out. Although counter-intuitive ones sets the compression on the read of the file. I am reading a .pdf but I set the compression to Group for like this:
var settings = new MagickReadSettings { Density = new Density(200), Compression = CompressionMethod.Group4 };
The thing I learned was that simply naming the output file .tif tells IM to output a tif. That is a handy way to do it, but it just seems sloppy.

Related

Cannot Write Multiple Paragraph in Aspose

I have an issue when I try to write multiple paragraphs in existing Shape. Only the first paragraph is written. I debug the code and I found that the Shape object as all the paragraphs I want. The problem is when I write to file I found only the first one. I share with you the project code.
class Program
{
public static void Run()
{
string dataDir = ConfigurationManager.AppSettings["directoryToSave"];
string srcDir = ConfigurationManager.AppSettings["Source"];
string appData = Environment.GetFolderPath(Environment.SpecialFolder.ApplicationData);
string file = Path.Combine(appData, srcDir);
using (Presentation presentation = new Presentation(srcDir))
{
IMasterLayoutSlideCollection layoutSlides = presentation.Masters[0].LayoutSlides;
ILayoutSlide layoutSlide = null;
foreach (ILayoutSlide titleAndObjectLayoutSlide in layoutSlides)
{
if (titleAndObjectLayoutSlide.Name == "TITRE_CONTENU")
{
layoutSlide = titleAndObjectLayoutSlide;
break;
}
}
var contenu = File.ReadAllText(#"E:\DemosProject\PF_GEN\PF_GEN\Source\contenu.txt", Encoding.UTF8);
IAutoShape contenuShape = (IAutoShape)layoutSlide.Shapes.SingleOrDefault(r => r.Name.Equals("contenu"));
ITextFrame txt = ((IAutoShape)contenuShape).TextFrame;
txt.Paragraphs.Clear();
string[] lines = contenu.Split(new[] { Environment.NewLine }, StringSplitOptions.None).Where(str => !String.IsNullOrEmpty(str)).ToArray();
for (int i = 0; i < lines.Length; i++)
{
var portion = new Portion();
portion.Text = lines[i];
var paragraphe = new Paragraph();
paragraphe.Portions.Add(portion);
txt.Paragraphs.Add(paragraphe);
}
presentation.Slides.InsertEmptySlide(0, layoutSlide);
presentation.Save(dataDir + "AddLayoutSlides_out.pptx", SaveFormat.Pptx);
}
}
static void Main(string[] args)
{
try
{
var path = ConfigurationManager.AppSettings["sourceAsposeLicensePath"];
License license = new License();
license.SetLicense(path);
Run();
}
catch (Exception ex)
{
Console.WriteLine("Error" + ex.Message);
}
finally
{
Console.WriteLine("Terminated");
Console.ReadKey();
}
}
}
You can find the ppt file (source file) in the attachement file. (https://gofile.io/?c=JpBDS8 1)
Is there any thing missing in my code?
Thanks

I have observed your requirements and suggest you to please try using following sample code on your end. In your sample code, you are adding different paragraphs to a shape inside LayoutSlide and then adding a slide using that LayoutSlide to contain the desired shape. This approach is not correct. You actually need to first add slide based on LayoutSlide and then add text to that shape as per your requirements. The following code will be helpful to you.
public static void RunParaText()
{
string path = #"C:\Aspose Data\";
string dataDir = path;
string srcDir = path + "Master.pptx";
//string appData = Environment.GetFolderPath(Environment.SpecialFolder.ApplicationData);
//string file = Path.Combine(appData, srcDir);
using (Presentation presentation = new Presentation(srcDir))
{
IMasterLayoutSlideCollection layoutSlides = presentation.Masters[0].LayoutSlides;
ILayoutSlide layoutSlide = null;
foreach (ILayoutSlide titleAndObjectLayoutSlide in layoutSlides)
{
if (titleAndObjectLayoutSlide.Name == "TITRE_CONTENU")
{
layoutSlide = titleAndObjectLayoutSlide;
break;
}
}
var contenu = File.ReadAllText(dataDir+"contenu.txt", Encoding.UTF8);
var slide=presentation.Slides.InsertEmptySlide(0, layoutSlide);
IAutoShape contenuShape = (IAutoShape)slide.Shapes.SingleOrDefault(r => r.Name.Equals("contenu"));
//IAutoShape contenuShape = (IAutoShape)layoutSlide.Shapes.SingleOrDefault(r => r.Name.Equals("contenu"));
ITextFrame txt = ((IAutoShape)contenuShape).TextFrame;
txt.Paragraphs.Clear();
string[] lines = contenu.Split(new[] { Environment.NewLine }, StringSplitOptions.None).Where(str => !String.IsNullOrEmpty(str)).ToArray();
for (int i = 0; i < lines.Length; i++)
{
var portion = new Portion();
portion.Text = lines[i];
var paragraphe = new Paragraph();
paragraphe.Portions.Add(portion);
txt.Paragraphs.Add(paragraphe);
}
//Change font size w.r.t shape size
contenuShape.TextFrame.TextFrameFormat.AutofitType = TextAutofitType.Normal;
presentation.Save(dataDir + "AddLayoutSlides_out.pptx", SaveFormat.Pptx);
}
}
I am working as Support developer/ Evangelist at Aspose.

c# load txt file and split it to X files based on number of lines

this is the code that i've written so far...
it doesnt do the job except re-write every line on the same file over and over again...
*RecordCntPerFile = 10K
*FileNumberName = 1 (file number one)
*Full File name should be something like this: 1_asci_split
string FileFullPath = DestinationFolder + "\\" + FileNumberName + FileNamePart + FileExtension;
using (System.IO.StreamReader sr = new System.IO.StreamReader(SourceFolder + "\\" + SourceFileName))
{
for (int i = 0; i <= (RecordCntPerFile - 1); i++)
{
using (StreamWriter sw = new StreamWriter(FileFullPath))
{
{ sw.Write(sr.Read() + "\n"); }
}
}
FileNumberName++;
}
Dts.TaskResult = (int)ScriptResults.Success;
}

If I understood correctly, you want to split a big file in smaller files with maximum of 10k lines. I see 2 problems on your code:
You never change the FullFilePath variable. So you will always rewrite on the same file
You always read and write the whole source file to the target file.
I rewrote your code to fit the behavior I said earlier. You just have to modify the strings.
int maxRecordsPerFile = 10000;
int currentFile = 1;
using (StreamReader sr = new StreamReader("source.txt"))
{
int currentLineCount = 0;
List<string> content = new List<string>();
while (!sr.EndOfStream)
{
content.Add(sr.ReadLine());
if (++currentLineCount == maxRecordsPerFile || sr.EndOfStream)
{
using (StreamWriter sw = new StreamWriter(string.Format("file{0}.txt", currentFile)))
{
foreach (var line in content)
sw.WriteLine(line);
}
content = new List<string>();
currentFile++;
currentLineCount = 0;
}
}
}
Of course you can do better than that, as you don't need to create that string and do that foreach loop. I just made this quick example to give you the idea. To improve the performance is up to you

Asp.Net Mvc Delete file issue

I have an issue with Files.
I am doing an image importer so clients put their files on an FTP server and then they can import it in the application.
During the import process I copy the file in the FTP Folder to another folder with File.copy
public List<Visuel> ImportVisuel(int galerieId, string[] images)
{
Galerie targetGalerie = MemoryCache.GetGaleriById(galerieId);
List<FormatImage> listeFormats = MemoryCache.FormatImageToList();
int i = 0;
List<Visuel> visuelAddList = new List<Visuel>();
List<Visuel> visuelUpdateList = new List<Visuel>();
List<Visuel> returnList = new List<Visuel>();
foreach (string item in images)
{
i++;
Progress.ImportProgress[Progress.Guid] = "Image " + i + " sur " + images.Count() + " importées";
string extension = Path.GetExtension(item);
string fileName = Path.GetFileName(item);
string originalPath = HttpContext.Current.Request.PhysicalApplicationPath + "Uploads\\";
string destinationPath = HttpContext.Current.Server.MapPath("~/Images/Catalogue") + "\\";
Visuel importImage = MemoryCache.GetVisuelByFilName(fileName);
bool update = true;
if (importImage == null) { importImage = new Visuel(); update = false; }
Size imageSize = importImage.GetJpegImageSize(originalPath + fileName);
FormatImage format = listeFormats.Where(f => f.width == imageSize.Width && f.height == imageSize.Height).FirstOrDefault();
string saveFileName = Guid.NewGuid() + extension;
File.Copy(originalPath + fileName, destinationPath + saveFileName);
if (format != null)
{
importImage.format = format;
switch (format.key)
{
case "Catalogue":
importImage.fileName = saveFileName;
importImage.originalFileName = fileName;
importImage.dossier = targetGalerie;
importImage.dossier_id = targetGalerie.id;
importImage.filePath = "Images/Catalogue/";
importImage.largeur = imageSize.Width;
importImage.hauteur = imageSize.Height;
importImage.isRoot = true;
if (update == false) { MemoryCache.Add(ref importImage); returnList.Add(importImage); }
if (update == true) visuelUpdateList.Add(importImage);
foreach (FormatImage f in listeFormats)
{
if (f.key.StartsWith("Catalogue_"))
{
string[] keys = f.key.Split('_');
string destinationFileName = saveFileName.Insert(saveFileName.IndexOf('.'), "-" + keys[1].ToString());
string destinationFileNameDeclinaison = destinationPath + destinationFileName;
VisuelResizer declinaison = new VisuelResizer();
declinaison.Save(originalPath + fileName, f.width, f.height, 1000, destinationFileNameDeclinaison);
Visuel visuel = MemoryCache.GetVisuelByFilName(fileName.Insert(fileName.IndexOf('.'), "-" + keys[1].ToString()));
update = true;
if (visuel == null) { visuel = new Visuel(); update = false; }
visuel.parent = importImage;
visuel.filePath = "Images/Catalogue/";
visuel.fileName = destinationFileName;
visuel.originalFileName = string.Empty;
visuel.format = f;
//visuel.dossier = targetGalerie; On s'en fout pour les déclinaisons
visuel.largeur = f.width;
visuel.hauteur = f.height;
if (update == false)
{
visuelAddList.Add(visuel);
}
else
{
visuelUpdateList.Add(visuel);
}
//importImage.declinaisons.Add(visuel);
}
}
break;
}
}
}
MemoryCache.Add(ref visuelAddList);
// FONCTION à implémenter
MemoryCache.Update(ref visuelUpdateList);
return returnList;
}
After some processes on the copy (the original file is no more used)
the client have a pop-up asking him if he wants to delete the original files in the ftp folder.
If he clicks on Ok another method is called on the same controller
and this method use
public void DeleteImageFile(string[] files)
{
for (int i = 0; i < files.Length; i++)
{
File.Delete(HttpContext.Current.Request.PhysicalApplicationPath + files[i].Replace(#"/", #"\"));
}
}
This method works fine and really delete the good files when I use it in other context.
But here I have an error message:
Process can't acces to file ... because it's used by another process.
Someone have an idea?
Thank you.
Here's the screenshot of Process Explorer

There are couple of thing you can do here.
1) If you can repro it, you can use Process Explorer at that moment and see which process is locking the file and if the process is ur process then making sure that you close the file handle after your work is done.
2) Use try/catch around the delete statement and retry after few seconds to see if the file handle was released.
3) If you can do it offline you can put in some queue and do the deletion on it later on.

You solve this by using c# locks. Just embed your code inside a lock statement and your threads will be safe and wait each other to complete processing.

I found the solution:
in my import method, there a call to that method
public void Save(string originalFile, int maxWidth, int maxHeight, int quality, string filePath)
{
Bitmap image = new Bitmap(originalFile);
Save(ref image, maxWidth, maxHeight, quality, filePath);
}
The bitmap maintains the file opened blocking delete.
just added
image.Dispose();
in the methos and it work fine.
Thank you for your help, and thank you for process explorer. Very useful tool

Need scan through text files in a directory and filter based on loop results

I have a program that scans through text files in a directory, loops through each line and parses based on a prefix in each line. The program acts as an extractor, extracting a tif image from a Base64 string on prefix "Hxx". When the program gets the image from line "Hxx", it simply deletes the original file.
What I would like to do is keep the filtering conditions for line "TXA" but instead of converting the string on line "Hxx" to an image and deleting the file, I would like to keep the entire contents of the file. Basically, only using the program to parse and filter through the text files based on conditions for line "TXA".
I know in case "TXA" of my foreach loop, I need to save the entire file into a memory stream to re-write the file towards the end of the program. I'm just not sure how at the moment.
Any help is greatly appreciated.
/// <summary>
/// This method will open, read and parse out the image file and save it to disk.
/// </summary>
/// <param name="fileName"></param>
/// <returns></returns>
static bool ParseHFPFile(string inputFileName, string outputFileName)
{
List<MemoryStream> tiffStreamList = new List<MemoryStream>();
// 1. grab file contents.
string fileContents = ProgramHelper.GetFileContents(inputFileName);
if (string.IsNullOrEmpty(fileContents))
{
return false; // errors already raised.
}
Log("[O] ", false);
// 2. split file contents into a string array.
string[] fileContentStringList = fileContents.Split('\r');
if (fileContentStringList.Length == 0)
{
Log(" ERROR: Unable to split file contents on [CR] character.");
return false;
}
// 3. loop through the file lines and parse each "section" based on it's prefix.
string mrn = string.Empty;
string dos = string.Empty;
string imgType = string.Empty;
foreach (string line in fileContentStringList)
{
if (string.IsNullOrEmpty(line))
{
continue;
}
string prefix = line.Substring(0, 3);
switch (prefix)
{
case "MSH":
break;
case "EVN":
break;
case "PID":
mrn = line.Split('|')[3].Split('^')[0];
break;
case "PV1":
dos = line.Split('|')[44];
if (!String.IsNullOrWhiteSpace(dos))
{
dos = dos.Substring(0, 8);
}
break;
case "TXA":
imgType = line.Split('|')[2].Split('^')[0];
if (imgType == "EDH02" || imgType == "EDORDH")
{
Log("[NP]");
return true;
}
break;
case "OBX":
break;
case "Hxx":
// 0 - Hxx
// 1 - page number
// 2 - image type
// 3 - base64 encoded image.
// 1. split the line sections apart based on the pipe character ("|").
string[] hxxSections = line.Split('|');
byte[] decodedImageBytes = Convert.FromBase64String(hxxSections[3].Replace(#"\.br\", ""));
// 2. create a memory stream to store the byte array.
var ms = new MemoryStream();
ms.Write(decodedImageBytes, 0, decodedImageBytes.Length);
// 3. add the memory stream to a memory stream array for later use in saving.
tiffStreamList.Add(ms);
break;
case "Z":
break;
}
}
Log("[P] ", false);
// 4. write the memory streams to a new file.
ImageCodecInfo icInfo = ImageCodecInfo.GetImageEncoders().Single(c => c.MimeType == "image/tiff");
System.Drawing.Imaging.Encoder enc = System.Drawing.Imaging.Encoder.SaveFlag;
var ep = new EncoderParameters(1);
// 5. create references to the EncoderValues we will use
var ep1 = new EncoderParameter(enc, (long)EncoderValue.MultiFrame);
var ep2 = new EncoderParameter(enc, (long)EncoderValue.FrameDimensionPage);
var ep3 = new EncoderParameter(enc, (long)EncoderValue.Flush);
string newOutputFilePath = Path.GetDirectoryName(outputFileName) + #"\";
string newOutputFileName = newOutputFilePath + mrn + "_" + dos + ".dat";
bool success = false;
int suffix = 1;
while (!success)
{
if (File.Exists(newOutputFileName))
{
newOutputFileName = newOutputFilePath + mrn + "_" + dos + "_" + suffix + ".dat";
}
else
{
success = true;
}
suffix++;
}
Log(string.Format("[NewFile: {0}] ", Path.GetFileName(newOutputFileName)), false);
var strm = new FileStream(newOutputFileName, FileMode.OpenOrCreate, FileAccess.ReadWrite);
System.Drawing.Image pages = null;
int frame = 0;
int pageCount = tiffStreamList.Count;
Log("[WT:", false);
try
{
foreach (MemoryStream m in tiffStreamList)
{
if (frame == 0)
{
ep.Param[0] = ep1;
pages = Image.FromStream(m, false, false);
pages.Save(strm, icInfo, ep);
}
else
{
ep.Param[0] = ep2;
if (pages != null)
pages.SaveAdd(Image.FromStream(m, false, false), ep);
}
if (frame == pageCount - 1)
{
ep.Param[0] = ep3;
if (pages != null)
pages.SaveAdd(ep);
}
frame++;
Log(".", false);
//m.Close();
m.Dispose();
}
Log("]");
return true;
}
catch (Exception ex)
{
Log(" EXCEPTION: " + ex.Message + Environment.NewLine + ex.StackTrace);
return false;
}
finally
{
}
}

I'm happy that you succeeded with your task. If my comment helped you solve the problem I'm gonna post it as an answer so you can accept it if you want.
So, instead of looking for more complicated ways you can just go and create two List<string> like:
List<string> TXA = new List<string>();
List<string> FilesForDelete = new List<string>();
and then in your code, depending on what you want to do with the file:
if (fileIsTXA)
{
TXA.Add(fileName);
}
else
{
FilesForDelete.Add(fileName);
}
Later on you can use those two lists to extract the file names and do whatever you want with them.

Read Image file metadata

I want to upload an image file and then extract its basic information (author, dimensions, date created, modified, etc) and display it to the user. How can I do it.
A solution or reference to this problem in asp.net c# code would be helpful. But javascript or php would be ok as well.

Check this Link. You will get more Clearance about GetDetailsOf() and its File Properties based on the Win-OS version wise.
If you want to use C# code use below code to get Metadata's:
List<string> arrHeaders = new List<string>();
Shell shell = new ShellClass();
Folder rFolder = shell.NameSpace(_rootPath);
FolderItem rFiles = rFolder.ParseName(filename);
for (int i = 0; i < short.MaxValue; i++)
{
string value = rFolder.GetDetailsOf(rFiles, i).Trim();
arrHeaders.Add(value);
}

C# solution could be found here:
Link1
Link2

Bitmap image = new Bitmap(fileName);
PropertyItem[] propItems = image.PropertyItems;
foreach (PropertyItem item in propItems)
{
Console.WriteLine("iD: 0x" + item.Id.ToString("x"));
}
MSDN Reference
C# Tutorial Reference

try this...
private string doUpload()
{
// Initialize variables
string sSavePath;
sSavePath = "images/";
// Check file size (mustn’t be 0)
HttpPostedFile myFile = FileUpload1.PostedFile;
int nFileLen = myFile.ContentLength;
if (nFileLen == 0)
{
//**************
//lblOutput.Text = "No file was uploaded.";
return null;
}
// Check file extension (must be JPG)
if (System.IO.Path.GetExtension(myFile.FileName).ToLower() != ".jpg")
{
//**************
//lblOutput.Text = "The file must have an extension of JPG";
return null;
}
// Read file into a data stream
byte[] myData = new Byte[nFileLen];
myFile.InputStream.Read(myData, 0, nFileLen);
// Make sure a duplicate file doesn’t exist. If it does, keep on appending an
// incremental numeric until it is unique
string sFilename = System.IO.Path.GetFileName(myFile.FileName);
int file_append = 0;
while (System.IO.File.Exists(Server.MapPath(sSavePath + sFilename)))
{
file_append++;
sFilename = System.IO.Path.GetFileNameWithoutExtension(myFile.FileName)
+ file_append.ToString() + ".jpg";
}
// Save the stream to disk
System.IO.FileStream newFile
= new System.IO.FileStream(Server.MapPath(sSavePath + sFilename),
System.IO.FileMode.Create);
newFile.Write(myData, 0, myData.Length);
newFile.Close();
return sFilename;
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Convert PDF to TIFF using ImageMagick & C# - c#

Related

Cannot Write Multiple Paragraph in Aspose

c# load txt file and split it to X files based on number of lines

Asp.Net Mvc Delete file issue

Need scan through text files in a directory and filter based on loop results

Read Image file metadata

Categories

Resources