XSL + XML -> PDF for C#

XSL + XML -> PDF for C# - c#

I know several people asked questions like this, but no answer helped to solve my problem.
Well, I have xsl and xml and want to generate pdf with a processor like Apache.FOP.
I am not able to use any JAVA programms like that. Just able to use C# libraries / exe.
I tried to use nFop:
Version 1.x uses Java.io and..
Version 2.0 doesn't have the ability to set XsltSettings
My current Software uses XSL + XML -> HTML (using standard Stystm.Xml.Xsl on C#) and wktmltopdf to generate PDF from created HTML.
But tables got split when they are too long for the page, and on the next page you don't have any column headers (this is very important for my problem).
I think there are no Free FO-Processor for pure C

Have a look at FoNET.
public static bool XMLToPDF(string pXmlFile, string pXslFile, string pFoFile, string pPdfFile)
{
string lBaseDir = System.IO.Path.GetDirectoryName(pXslFile);
XslCompiledTransform lXslt = new XslCompiledTransform();
lXslt.Load(pXslFile);
lXslt.Transform(pXmlFile, pFoFile);
FileStream lFileInputStreamFo = new FileStream(pFoFile, FileMode.Open);
FileStream lFileOutputStreamPDF = new FileStream(pPdfFile, FileMode.Create);
FonetDriver lDriver = FonetDriver.Make();
lDriver.BaseDirectory = new DirectoryInfo(lBaseDir);
lDriver.CloseOnExit = true;
lDriver.Render(lFileInputStreamFo, lFileOutputStreamPDF);
lFileInputStreamFo.Close();
lFileOutputStreamPDF.Close();
return System.IO.File.Exists(pPdfFile);
}

Related

Using C# iText 7 to flatten an XFA PDF

Is it possible to use iText 7 to flatten an XFA PDF? I'm only seeing Java documentation about it (http://developers.itextpdf.com/content/itext-7-examples/itext-7-form-examples/flatten-xfa-using-pdfxfa).
It seems like you can use iTextSharp, however to do this.
I believe it's not an AcroForm PDF because doing something similar to this answer How to flatten pdf with Itext in c#? simply created a PDF that wouldn't open properly.

It looks like you have to use iTextSharp and not iText7. Looking at the NuGet version it looks like iTextSharp is essentially the iText5 .NET version and like Bruno mentioned in the comments above, the XFA stuff simply hasn't been ported to iText7 for .NET.
The confusion stemmed from having both iText7 and iTextSharp versions in NuGet and also the trial page didn't state that the XFA worker wasn't available for the .NET version of iText7 (yet?)
I did the following to accomplish what I needed at least for a trial:
Request trial copy here: http://demo.itextsupport.com/newslicense/
You'll be emailed an xml license key, you can just place it on your desktop for now.
Create a new console application in Visual Studio
Open the Project Manager Console and type in the following and press ENTER (this will install other dependencies as well)
Install-Package itextsharp.xfaworker
Use the following code:
static void Main(string[] args)
{
ValidateLicense();
var sourcePdfPath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory), "<your_xfa_pdf_file>");
var destinationPdfPath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory), "output.pdf");
FlattenPDF(sourcePdfPath, destinationPdfPath);
}
private static void ValidateLicense()
{
var licenseFileLocation = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory), "itextkey.xml");
iTextSharp.license.LicenseKey.LoadLicenseFile(licenseFileLocation);
}
private static void FlattenPDF(string sourcePdfPath, string destinationPdfPath)
{
using (var sourcePdfStream = File.OpenRead(sourcePdfPath))
{
var document = new iTextSharp.text.Document();
var writer = iTextSharp.text.pdf.PdfWriter.GetInstance(document, new FileStream(destinationPdfPath, FileMode.Create));
var xfaf = new iTextSharp.tool.xml.xtra.xfa.XFAFlattener(document, writer);
sourcePdfStream.Position = 0;
xfaf.Flatten(new iTextSharp.text.pdf.PdfReader(sourcePdfStream));
document.Close();
}
}
The trial will put a huge watermark on the resulting PDF, but at least you can get it working and see how the full license should work.

For IText 7 this could be done in the following way
LicenseKey.LoadLicenseFile(#"Path of the license file");
MemoryStream dest_File = new MemoryStream();
XFAFlattener xfaFlattener = new XFAFlattener();
xfaFlattener.Flatten(new MemoryStream( File.ReadAllBytes(#"C:\\Unflattened file")), dest_File);
File.WriteAllBytes("flatten.pdf", dest_File.ToArray());

scraping data from website with a C# console application

I'm trying to learn Spanish and making some flash cards (for my personal use) to help me learn the verbs.
Here is an example, page example. So near the top of the page you will see the past participle: bloqueado & gerund: bloqueando. It is these two values that I wish to obtain in my code and use for my flash cards.
If this is possible I will use a C# console application. I am aware that scraping data from a website is not ideal however this is a once off.
Any guidance on how to start something like this and pitfalls to avoid would be very helpful!

I know this isn't an exact answer, but here is the process I would suggest.
https://www.gnu.org/software/wget/ and mirror the website to a
folder. Wget is a web spider and will follow the links on the site until it has downloaded everything. You'll have to run it with a few different parameters until you figure out the correct settings you want.
Use C# to run through each file in the folder and extract the
words from <section class="verb-mood-section"> in each file. It's your choosing of whether you want to output them to the console or store them in a database or flat file.
Should be that easy, in theory.

Use SGMLReader. SGMLReader is a versatile and robust component that will stream HTML to an XMLReader:
XmlDocument FromHtml(TextReader reader) {
// setup SgmlReader
Sgml.SgmlReader sgmlReader = new Sgml.SgmlReader();
sgmlReader.DocType = "HTML";
sgmlReader.WhitespaceHandling = WhitespaceHandling.All;
sgmlReader.CaseFolding = Sgml.CaseFolding.ToLower;
sgmlReader.InputStream = reader;
// create document
XmlDocument doc = new XmlDocument();
doc.PreserveWhitespace = true;
doc.XmlResolver = null;
doc.Load(sgmlReader);
return doc;
}
You can see that you need to create a TextReader first. TThis would in reality be a StreamReader as a TextReader is an abstract class.
Then you create the XMLDocument over that. Once you've got it into the XMLDocument you can use the various methods supported by XMLDocument to isolate and extract the nodes you need. I'll leave you to explore that aspect of it.
You might try using the XDocument class as it's a lot easier to handle than the XMLDocument, especially if you're a newbie. It also supports LINQ.

Is there a Java equivalent for XmlDocument.LoadXml() from .NET?

In .NET C#, when trying to load a string into xml, you need to use XmlDocument type from System.Xml and do the following:
e.g:
string xmlStr = "<name>Oscar</name>";
XmlDocument doc = new XmlDocument();
doc.LoadXml(xmlStr);
Console.Write(doc.OuterXml);
This seems simple but how can I do this in Java? Is it possible to load a string into xml using something directly, short and simple like above and avoid implementing other methods for this?
Thanks in advance.

Try this:
DocumentBuilderFactory documentBuildFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder doccumentBuilder = documentBuildFactory.newDocumentBuilder();
Document document =
doccumentBuilder.parse(new ByteArrayInputStream("<name>Oscar</name>".getBytes()));
You can traverse Oscar by:
String nodeText = document.getChildNodes().item(0).getTextContent() ;
System.out.println(nodeText);
To transaform back:
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();
DOMSource domSource = new DOMSource(document);
//to print the string in sysout, System.out
StreamResult streamResult = new StreamResult(System.out);
transformer.transform(domSource, streamResult );
To get the result in String:
DOMSource source = new DOMSource(document);
ByteArrayOutputStream outStream = new ByteArrayOutputStream();
StreamResult result = new StreamResult(outStream);
transformer.transform(source, result);
String resultString = new String( outStream.toByteArray());
System.out.println(resultString);

You have a choice of tree models in Java - DOM, XOM, JDOM, DOM4J. Many people (including Singh above) use DOM by default because it is included in the JDK, but it's probably the worst of the bunch, largely because it's the oldest (it was invented before namespaces came along), and because it tries to do too much (HTML, event handling etc, as well as XML). I'd suggest using JDOM2. It shouldn't be hard if you look at the Javadoc for you to find the method that builds a JDOM2 document from an input stream.

Java has a ton of libraries for working with XML. In addition to the many classes that work with XML included with the standard Java installation, there are lots of other open source libraries available. Here are a few options. Take a look at the docs and see which one meets your needs:
The XStream Library is very fast and easy to use. I've used it for XML serialization and I'm very happy with it. If you would rather not use an extrenal library, then try
the javax.xml.parsers.DocumentBuilder class that is demonstrated here

If you want to parse a String str to Document in Java, you can do following:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = dbf.newDocumentBuilder();
InputStream is = new ByteArrayInputStream(str.getBytes("UTF-8"));
return docBuilder.parse(is);

Converting PDF, Doc and Docx to rtf in c#

I have a requirement for an application that takes Doc, Docx and PDF and converts them to RTF.
The conversion is one way and I do not need to convert back to Doc or PDF.
Has anyone done this and can you recommend a libray? I know there is aspose but it's way to pricey and the licenses are per year so that's not going to work for the company I happen to work for.
I'm ok using more than one library for each of the file types if thats what it takes.
Thanks in advance

Telerik has a nice library to do this. They actually have an entire editor that looks like Microsoft Word. It can open multiple file formats and it saves natively as RTF (although it can save as PDF, DOCX, etc.) The one thing I'm not sure of is opening the PDF and saving as an RTF. I'm not sure that the Telerik library can do that.
Here is a link to the library:
http://www.telerik.com/products/wpf/richtextbox.aspx
For a PDF to RTF library, you could use this:
http://www.sautinsoft.com/products/pdf-focus/index.php

GroupDocs.Conversion Cloud is a REST API that converts all common file formats from on format to another reliably and easily. Its free pricing plan offers 50 free credits per month.
Here is sample code for PDF to RTF from default storage:
// Get App Key and App SID from https://dashboard.groupdocs.cloud/
var configuration = new GroupDocs.Conversion.Cloud.Sdk.Client.Configuration(MyAppSid, MyAppKey);
var apiInstance = new ConvertApi(configuration);
try
{
// convert settings
var settings = new GroupDocs.Conversion.Cloud.Sdk.Model.ConvertSettings
{
StorageName = null,
FilePath = "02_pages.pdf",
Format = "rtf",
ConvertOptions = new RtfConvertOptions(),
OutputPath = "02_pages.rtf"
};
// convert to specified format
List<StoredConvertedResult> response = apiInstance.ConvertDocument(new ConvertDocumentRequest(settings));
Console.WriteLine("Document converted successfully: " + response[0].Url);
}
catch (Exception e)
{
Console.WriteLine("Exception when calling ConvertApi.QuickConvert: " + e.Message);
}
I'm developer evangelist at aspose.

Create HTML Document and place table in it using C# (Windows Application)

I want to create a HTML document and create table init using C#. I don't want to use ASP or any thing like that. I want to do this by using C# Windows Application.
The created document should not use MS Word or may not depend on any other app and save it to any folder (C:\) etc. It is totally independent of any other MS product and can run in any PC

Something like this :)
String exportdirectory = "c:";
StreamWriter sw;
sw = File.CreateText(exportDirectory + "filename.html");
sw.WriteLine("<table>");
sw.WriteLine("<tr>");
sw.WriteLine("<td>");
sw.WriteLine("contents of table!");
sw.WriteLine("</td>");
sw.WriteLine("</tr>");
sw.WriteLine("</table>");
sw.Close();

This is just creating a string ( maybe using a StringBuilder ) and appending data, then save it with a StreamWriter. If you want something more versatile, try to use some sort of stringtemplate, http://www.antlr.org/wiki/display/ST/StringTemplate+Documentation for example, this could allow you to isolate the "view" portion of your application allowing some sort of configuration to easily drive the output generation.

Well its not clear why you have such requirement or why are u mentioning MS Word... it doesn't make sense...
But what you want is your programme to Create a HTMl File.. (you dont even need ASP.NEt for that.. its for web server programming...)
My best guess is you want something to emit data in HTML format according to some user requirements.
For that..
Simply just create a file with HTML Extension and start Emitting HTML specific tags to it.
Few guys have already mentioned how to do that.
A quick and Dirty way would be somthing like this...
public void CreateFile()
{
StringBuilder sb = new StringBuilder();
sb.Append("<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd \"> \r\n ");
sb.Append("<html xmlns=\"http://www.w3.org/1999/xhtml\">\r\n");
sb.Append("<head>\r\n");
sb.Append("<meta content=\"text/html; charset=utf-8\" http-equiv=\"Content-Type\" />\r\n");
sb.Append("<title>My HTMl Page</title>\r\n");
sb.Append("</head>\r\n");
sb.Append("<body>\r\n");
sb.Append("<table>\r\n");
//If you are emiting some data like a list of items
foreach (var item in Items)
{
sb.Append("<tr>\r\n");
sb.Append("<td>\r\n");
sb.Append(item.ToString());
sb.Append("</td>\r\n");
sb.Append("</tr>\r\n");
}
sb.Append("</table>\r\n");
sb.Append("</body>\r\n");
sb.Append("</html>\r\n");
System.IO.StreamWriter sr = new System.IO.StreamWriter("Your file path .html");
sr.Write(sb.ToString());
}
There is also a System.Web.UI.HTML32TextWriter class that is specially ment for this purpose but you didn't wanted anything from ASP, so...

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

XSL + XML -> PDF for C# - c#

Related

Using C# iText 7 to flatten an XFA PDF

scraping data from website with a C# console application

Is there a Java equivalent for XmlDocument.LoadXml() from .NET?

Converting PDF, Doc and Docx to rtf in c#

Create HTML Document and place table in it using C# (Windows Application)

Categories

Resources