I am trying to include an xml file into a pdf/a2 using itext7 and c#.
The xml has to be included into pdf names tree.
I was finally able to do it, the xml stream seems to be in the right names tree position.
There is still a problem, the stream has suppose to be compressed but is not, viewing the output pdf in a notepad you see the plain text.
It seems that the itext7 PdfStream is not able to do it, or at least, I am not able to compress it even setting compressionleve=9 - stream.SetCompressionLevel(9) in my code.
Anyone has a clue on how to do it.
Thank you very much.
Mauro
Here is my code:
using iText.Forms;
using iText.Kernel.Pdf;
using iText.Pdfa;
using iText.Layout.Element;
using System;
using System.Collections.Generic;
using System.IO;
using System.IO.Compression;
using System.Linq;
using System.Text;
namespace PdfXml
{
class Program
{
static void Main(string[] args)
{
string pdfIn = "\\xfar\\pdfin.pdf";
string pdfOut = "\\xfar\\pdfoutU.pdf";
string cdaIn = "\\xfar\\cda.xml";
StreamReader Reader = new StreamReader(cdaIn);
var content = new StringBuilder();
string line;
while (Reader.EndOfStream == false)
{
line = Reader.ReadLine();
content.AppendLine(line);
}
byte[] bytes = Encoding.ASCII.GetBytes(content.ToString());
PdfDocument pdfDoc = new PdfDocument(new PdfReader(pdfIn), new PdfWriter(pdfOut));
PdfStream stream = new PdfStream();;
PdfNameTree nameTree1 = pdfDoc.GetCatalog().GetNameTree(new PdfName("XFAResource"));
stream.SetCompressionLevel(9);
stream.SetData(bytes, true);
stream.Put(PdfName.Filter, PdfName.FlateDecode);
nameTree1.AddEntry("dataset", stream);
nameTree1.BuildTree();
nameTree1.SetModified();
Console.WriteLine("ok");
}
pdfDoc.Close();
}
}
}
Related
I want to convert an html string into a byte[] pdf and send it as a file for download. But all the libraries I see either have dependency issues or won't work and give errors like "System.Drawing.Common is not supported on this platform.".
I'm working on Ubuntu
Dotnet version 7.0.102
Some packages give warning like -
"Package 'PDFsharp 1.32.3057' was restored using '.NETFramework,Version=v4.6.1, .NETFramework,Version=v4.6.2, .NETFramework,Version=v4.7, .NETFramework,Version=v4.7.1, .NETFramework,Version=v4.7.2, .NETFramework,Version=v4.8, .NETFramework,Version=v4.8.1' instead of the project target framework 'net7.0'. This package may not be fully compatible with your project."
this stops me from being able to use hot reload feature when using dotnet watch, it asks me to restart the server on every change.
You can use the iTextSharp library for this.
Please refer to the below code snippet for your reference :
using System;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.html.simpleparser;
using iTextSharp.text.pdf;
namespace HTMLToPDFExample
{
class Program
{
static void Main(string[] args)
{
// Your string HTML here
string html = #"<html><body>
<h1>Example HTML to PDF Conversion</h1>
<p>This is an example of how to convert an HTML string to a PDF document.</p>
</body></html>";
string pdfFile = "example.pdf";
using (var ms = new MemoryStream())
{
using (var doc = new Document(PageSize.A4))
{
using (var writer = PdfWriter.GetInstance(doc, ms))
{
doc.Open();
using (var sr = new StringReader(html))
{
var htmlContext = new HTMLWorker(doc);
htmlContext.Parse(sr);
}
doc.Close();
}
}
File.WriteAllBytes(pdfFile, ms.ToArray());
}
}
}
}
When creating a spreadsheet with the OpenXML SpreadsheetDocument class in C#.Net, the Authors and Last saved by fields are set to "James Westgate".
How do I clear or overwrite James' name?
SpreadsheetDocument doc = SpreadsheetDocument.Open(stream, true);
doc.PackageProperties.Creator = "sh";
...is not working for me.
Update:-
using System;
using System.Windows.Forms;
using System.IO;
using DocumentFormat.OpenXml.Extensions;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;
namespace OpenXMLProps
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
public MemoryStream Execute()
{
MemoryStream stream = SpreadsheetReader.Create();
using (SpreadsheetDocument doc = SpreadsheetDocument.Open(stream, true))
{
WorksheetPart worksheetPart = SpreadsheetReader.GetWorksheetPartByName(doc, "Sheet1");
WorksheetWriter writer = new WorksheetWriter(doc, worksheetPart);
doc.PackageProperties.Creator = "Finbar mahoolahan";
SpreadsheetWriter.Save(doc);
return new MemoryStream(stream.ToArray());
}
}
private void button1_Click(object sender, EventArgs e)
{
using (FileStream fs = new FileStream("excel_test.xlsx", FileMode.Create, FileAccess.Write))
{
MemoryStream excel_stream = Execute();
excel_stream.WriteTo(fs);
}
}
}
}
This works for me (after NuGet-ing in the OpenXml package):
using (var doc = SpreadsheetDocument.Open(#"C:\tmp\MyExcelFile.xlsx", true))
{
var props = doc.PackageProperties;
props.Creator = "Flydog57";
props.LastModifiedBy = "Flydog57";
doc.Save();
}
It even worked the first time I tried it! That's an unusual occurrence.
I believe (without looking at the code which is probably 8yrs old by now) that there is a base document which forms the template and this probably has my name associated with it. This is embedded as a resource in the dll.
I suggest trying to download the original SimpleOOXml source code from CodePlex and try modifying the document. You could recompile against a modern version of the OpenXML library too.
I'm trying to add HTML content to DOCX file using OpenXML altchunk approach using C#. The below sample code works fine and appends the HTML content to the end of the document. My requirement is to add HTML content at a specific place in the document, like inside a table cell or inside a paragraph, or search and replace a specific string with an HTML string or placeholders marked using content controls. Can you please point me to some sample example or share few suggestions. Please let me know if you need more info.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Xml.Linq;
using DocumentFormat.OpenXml.Packaging;
using OpenXmlPowerTools;
using DocumentFormat.OpenXml.Wordprocessing;
using DocumentFormat.OpenXml;
using System.Xml;
namespace Docg2
{
class Program
{
static void Main(string[] args)
{
testaltchunk();
}
public static void testaltchunk()
{
XNamespace w = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
XNamespace r = "http://schemas.openxmlformats.org/officeDocument/2006/relationships";
using (WordprocessingDocument myDoc = WordprocessingDocument.Open("../../Test3.docx", true))
{
string html =
#"<html>
<head/>
<body>
<h1>Html Heading</h1>
<p>This is an html document in a string literal.</p>
</body>
</html>";
string altChunkId = "AltChunkId1";
MainDocumentPart mainPart = myDoc.MainDocumentPart;
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart("application/xhtml+xml", altChunkId);
using (Stream chunkStream = chunk.GetStream(FileMode.Create, FileAccess.Write))
using (StreamWriter stringStream = new StreamWriter(chunkStream))
stringStream.Write(html);
XElement altChunk = new XElement(w + "altChunk", new XAttribute(r + "id", altChunkId));
XDocument mainDocumentXDoc = GetXDocument(myDoc);
mainDocumentXDoc.Root
.Element(w + "body")
.Elements(w + "p")
.Last()
.AddAfterSelf(altChunk);
SaveXDocument(myDoc, mainDocumentXDoc);
}
}
private static void SaveXDocument(WordprocessingDocument myDoc, XDocument mainDocumentXDoc)
{
// Serialize the XDocument back into the part
using (var str = myDoc.MainDocumentPart.GetStream(FileMode.Create, FileAccess.Write))
using (var xw = XmlWriter.Create(str))
mainDocumentXDoc.Save(xw);
}
private static XDocument GetXDocument(WordprocessingDocument myDoc)
{
// Load the main document part into an XDocument
XDocument mainDocumentXDoc;
using (var str = myDoc.MainDocumentPart.GetStream())
using (var xr = XmlReader.Create(str))
mainDocumentXDoc = XDocument.Load(xr);
return mainDocumentXDoc;
}
}
}
To expand on my comment a little bit: You really shouldn't be manipulating the document XML yourself. You lose all the benefits of using OpenXML in the first place. Thus, your code could be re-written like this:
static void Main(string[] args)
{
using (WordprocessingDocument myDoc = WordprocessingDocument.Open("../../Test3.docx", true))
{
string html =
#"<html>
<head/>
<body>
<h1>Html Heading</h1>
<p>This is an html document in a string literal.</p>
</body>
</html>";
string altChunkId = "AltChunkId1";
MainDocumentPart mainPart = myDoc.MainDocumentPart;
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart("application/xhtml+xml", altChunkId);
using (Stream chunkStream = chunk.GetStream(FileMode.Create, FileAccess.Write))
using (StreamWriter stringStream = new StreamWriter(chunkStream))
stringStream.Write(html);
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
// this inserts altChunk after the last Paragraph
mainPart.Document.Body
.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
mainPart.Document.Save();
}
}
Now, it becomes clear that you can insert your AltChunk after, or before, or inside any element in the document, as long as you can find the element. That part will depend on what you're searching for.
If you're searching for a specific table, then search for a DocumentFormat.OpenXml.Wordprocessing.Table etc. Here is one example of how to search for a specific table in a document: Find a specific Table (after a bookmark) in open xml
Here's an example of replacing a content control https://msdn.microsoft.com/en-us/library/cc197932(v=office.12).aspx
I have some documents in my TFS project,I want to create a console application that reads the documents from TFS and copy the file to my local storage, any idea?
Check the code in this article, which works for you:
using System;
using Microsoft.TeamFoundation.Client;
using Microsoft.TeamFoundation.VersionControl.Client;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string teamProjectCollectionUrl = "http://YourTfsUrl:8080/tfs/YourTeamProjectCollection";
string filePath = #"C:\project\myfile.cs";
// Get the version control server
TfsTeamProjectCollection teamProjectCollection = TfsTeamProjectCollectionFactory.GetTeamProjectCollection(new Uri(teamProjectCollectionUrl));
VersionControlServer versionControlServer = teamProjectCollection.GetService<VersionControlServer>();
// Get the latest Item for filePath
Item item = versionControlServer.GetItem(filePath, VersionSpec.Latest);
// Download and display content to console
string fileString = string.Empty;
using (Stream stream = item.DownloadFile())
{
using (MemoryStream memoryStream = new MemoryStream())
{
stream.CopyTo(memoryStream);
// Use StreamReader to read MemoryStream created from byte array
using (StreamReader streamReader = new StreamReader(new MemoryStream(memoryStream.ToArray())))
{
fileString = streamReader.ReadToEnd();
}
}
}
Console.WriteLine(fileString);
Console.ReadLine();
}
}
}
By the way, you can also use tf get command to get or download a specified version of one or more files or folders from TFS to the workspace, which is an easy way.
I want to read the callout text boxes in a PDF. I'm using iTextSharp to iterate through all the annotations as follows:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using iTextSharp.text;
using iTextSharp.text.pdf;
using System.IO;
namespace PDFAnnotationReader
{
class Program
{
static void Main(string[] args)
{
StringBuilder text = new StringBuilder();
string fileName = #"C:\Users\J123\Desktop\xyz.pdf";
PdfReader pdfReader = new PdfReader(fileName);
PdfDictionary pageDict = pdfReader.GetPageN(1);
PdfArray annotArray = pageDict.GetAsArray(PdfName.ANNOTS);
for (int i=0;i<annotArray.Size;i++)
{
PdfDictionary curAnnot = annotArray.GetAsDict(i);
}
}
}
Examining the hashMap of curAnnot, I see that when I get to an annotation that is a callout text box, the dictionary includes the following key-value pairs:
{[/IT,/FreeTextCallout]}
{[/Contents,xyz this is a callout]}
So I think what I should do is check each annotation to see if it includes the key /IT with the value /FreeTextCallout and if so, get the value of /Contents as a string like so:
if (curAnnot.Contains(PdfName.IT))
{
if (curAnnot.Get(PdfName.IT)==PdfName.FREETEXTCALLOUT)
{
Console.Writeline(curAnnot.Get(PdfName.CONTENTS).ToString());
}
}
But there doesn't seem to be a PdfName.IT or PdfName.FREETEXTCALLOUT. How do I check for the existence of /IT and retrieve its value?
You can create your own PdfName objects using the constructor on PdfName:
new PdfName("IT");
So:
var myPdfNameIT = new PdfName("IT");
if (curAnnot.Contains(myPdfNameIT)) {
//...
}