Removable Images in PDF - c#

I'm writing a ASP.NET program that inserts PNG images into a PDF. Ideally, I'd like to give users the option to undo this action later on.
I'm currently using the PDFSharp library to add the images, which is working fine, but I can't find anyway to then remove them. I've looked into the annotation functionality a little bit, but as far as I can tell there's no way to use custom images.
Does anyone have any experience or insight into this issue? Thanks.

The PDF format is final presentation format not really designed for this kind of modifications, it is possible to do this with other libraries, but for your particular case I would recommend instead keeping a copy of the original file rather than trying to revert this kind of modifications.

Related

Converting pdf to tiff using C# .net 3.0 or less without 3rd party libraries?

There are several SO posts and googling which did not really help much with my question. So here I go again.
I need to convert a PDF to a single tiff image (multi-page tiff obviously). I have figured out the tiff creation part. But the issue is with extracting a image/bitmap from pdf. Of course c# .net does not have the functions, but there should be way to do it.
On why I dont want to use third party libraries, its because they are not free - some may be, but for security reasons it may not be usable in all environments. And more than everything just curious how to do it and in some posts this question is being treated as a sin :).
Any proper methods/ideas or where to start would be helpful. I would prefer WPF based solutions than GDI+ based, as I have seen issues with GDI+ tiff creation solution on windows servers . I was of the idea that creating pdf is more difficult and of course I can understand if it was easy it should have been in .net already.
Edit: Also for a starter, a pdf which contains a simple format would be nice. Not necessary that it should support every type of pdf.
Even with 3rd party it's not going to be easy :) Convert a PDF into a series of images using C# and GhostScript

Server Side HTML to PDF

I'm trying to find a C# library that will allow me to "Print" one of my HTML pages to a PDF file. I can't seem to find out if one currently exists that will allow you to do this. I've found several that will let you build a page, but haven't noticed if one would generate the pdf only based off of HTML.
EDIT: I'm not allowed a budget on this at work so it will need to be an open source/free product. If not I'm aware of iTextSharp and will have to generate the pdf programmatically (which is what I'm hoping to avoid :) )
I've had a lot of luck with ActivePDF WebGrabber. It's kind of odd to use compared to standard managed libraries (ActivePDF is unmanaged), but it gets the job done.
iTextSharp comes with a little companion : XML Worker
For a demo, have a look here
Even though the documentation refers to the Java API, the adaptation to C# should be straightforward.
I've experimented with itextsharp and it works for basic conversion, but gets complicated when you get into styles and formatting. I've also heard wkhtmltopdf is out there as another option.

Programmatically change PNG / PSD image colour in .Net

I have a number of web controls, which are made up of png images. The simplest is a button.
I need to be able to generate these controls with different colours depending on the colour selected by the client.
The images are .PSD files, layered before exporting to png.
My idea was to allow the client to pick one colour and use a layer filter in the psd to change the overall colour of the image and programmatically export the .PSD to PNG on the server. I looked into using the Photoshop CS Interface via COM, but haven't got my head around it, has anyone else used it for a similar task?
Alternatively I could read the png into memory and perform colour replacement, but this seems really complex for what reads like a simple(ish) task.
Many thanks in advance
.PSD is quite complicated and poor documented file format, that is constantly receiving new features from Adobe, so editing them is no way an easy task.
One way is to use Photoshop batch processing, which means photoshop installed on server, but as long you you wished to make that through COM, it should not be a problem.
One of the starting points may be: http://www.webdesignerdepot.com/2008/11/photoshop-droplets-and-imagemagick/
Another way would be to try composite layers using c#, that means you would have some layers ready (textures/borders/etc), some would be created at runtime and all those layers would be merged at runtime using c#.

Batch conversion of docx to clean HTML

I'm starting to wonder if this is even possible. I've searched for solutions on Google and come up with nothing that works exactly how I'd like it to.
I think it'd benefit to explain what that entails. I work for database group at my university's IT department. My main job is to take specs of a report in a docx file, copy that over to dreamweaver, fix some formatting, and put it onto their website. My issue is that it's ridiculously tedious to do this over and over. I figured, hey, I haven't written anything in C# for some time now, perhaps I could write an application to grab a docx file, convert it to HTML, fix the CSS, stick the header, and footer from the webpage on there, and save the result. I originally planned to have it do one by one, but it probably wouldn't be difficult to have it input a list of files and batch convert.
I've found these relevant topics on how to accomplish this, but they don't fit my needs well enough.
http://www.techrepublic.com/blog/howdoi/how-do-i-modify-word-documents-using-c/190
This is probably fine for a few documents, but since it's just automating an instance of Word, I feel like it'd be slow and memory intensive. I'd prefer to avoid opening and closing an instance of Word 50+ times.
http://openxmldeveloper.org/articles/333.aspx
This is what I started using. XSLT had the benefit of not needing word to be installed nor ran for each file. After some searching I got a proof of concept working. It takes in a docx file, decompresses it, grabs the document.xml from that, and uses the DocX2Html.xsl file I scavenged from OpenXML viewer. I believe that was originally provided by MS for sharepoint servers to provide the ability to render word documents in a browser. Or something along those lines.
After adjusting that code to fit my needs, and having issues with the objXSLT.Load () method, I ended up using IlMerge to make the XSL into a DLL. No idea why I kept getting a compile error when using the plain old XSL file, but the DLL worked fine, so I was satisfied. Here (http://pastebin.com/a5HBAakJ) is my current code. It does the job of converting docx to HTML just fine (other than random spaces between some words), but the result file has ridiculously ugly HTML syntax. An example of this monstrosity can be found here (http://pastebin.com/b8sPGmFE).
Does anyone know how I could remedy this? I'm thinking perhaps I need to make a new XSL file, as the one MS provided is what's responsible for sticking all those tags and extra code in there. My issue with that is that I don't know anything about how to do that. Perhaps there's an alternative version already out there. All I'd need is one that will preserve tables and text formatting. Images aren't needed.
This looks like just what you need: http://msdn.microsoft.com/en-us/library/ff628051(v=office.14).aspx
The author Eric White blogged about his experiences developing that tool. You can see that list of posts on his blog here: http://blogs.msdn.com/b/ericwhite/archive/2008/10/20/eric-white-s-blog-s-table-of-contents.aspx#Open_XML_to_XHtml
Since I'm a big fan of Aspose.Words, a commercial library to create/process Word documents, I would do something like:
Open the Word document with Aspose.Words.
Save the Word document as HTML.
Use something like SgmlReader or HTML Agility Pack (or even Regular Expressions if it is suitable) to remove unwanted HTML tags/attributes.
Since you wrote you work at an university, I'm not sure whether commercial packages are an option, though.
Hi not sure what the rules are on promoting your own solutions, so do let me know if I am out of line.
I am a web developer who had the same issues, so I created my own tool:
http://www.convertwordtohtml.com
We are also working on a new version that will have even better conversion quality and one click conversion eg you can right click on a word file and it will be directly converted to html and the code placed into the clipboard. The current version also supports command line access and the new version will have a server version to.
There is a free trial version downloadable from the site , and if you have any questions do contact me any time.

Compare PDF documents in Adobe Acrobat via SDK

We are planning on implementing a solution for comparing different revisions of a PDF document in our .Net Windows Forms application. In Adobe Acrobat there is a nice feature for comparing two documents, but I have not been able to find any information about whether it is possible to create a plug-in (or something else) to this feature from our application.
I would really appreciate it if any of you could point me in the direction to how I should go about to make such a solution.
I have also looked at other threads here at Stackoverflow for comparing PDF documents, particularly these threads:
How to compare two PDF-files
PDF-libraries
I did not really find a good solution there for a library or SDK letting us create a good solution for comparing PDF-documents in a way which is easy to understand for users of the system.
Do you know any good solutions to solve this problem?
All help appreciated! :)
Do you know the pdf files? or you just want to make the compare without knowing it. If you know the pdf files, you can use variables values on the specific fields and compare the values between files, instead comparing the entire pdf file.

Categories

Resources