WPF to XPS with outline and structure - c#

I am developing a WPF reporting application.
My report is constructed as WPF Control (FlowDocument or FixedDocument) and
contains tables. I want to save it as XPS preserving its structure (this means that I can copy a table as a table, not plain text like explained in this article). I found a way to save WPF Control with XpsDocumentWriter or XpsSerializationManager, but the result has no structure or outline. Is it possible to save WPF Control as Xps preserving its structure?

XPS is a fixed document format, and WPF allows you to save a FlowDocument to a FixedDocument as a XPS file, code is needed when you want to add more features, you can follow this article to go further in that.
Convert XAML Flow Document to XPS with Style (multiple page, page size, header, margin)

It seems that there is no way to preserve WPF element semantics when serializing it with XpsDocumentWriter or XpsSerializationManager.
The only way to construct a document with structure is using low-level API from System.Windows.Xps.Packaging namespace, as described in this article. Using this API you can obtain XmlWriter for constructing FixedPage content
XpsDocument document = new XpsDocument(destFileName,FileAccess.ReadWrite);
IXpsFixedDocumentSequenceWriter docSeqWriter = document.AddFixedDocumentSequence();
IXpsFixedDocumentWriter docWriter = docSeqWriter.AddFixedDocument();
IXpsFixedPageWriter pageWriter = docWriter.AddFixedPage();
XmlWriter xmlWriter = pageWriter.XmlWriter;
and a Stream for writing document structure
XpsResource storyFraments = pageWriter.AddStoryFragment();
Stream stream = storyFraments.GetStream();
Although there are classes in the System.Windows.Documents.DocumentStructures namesapace representing StoryFragments elements and it’s children, you cannot use them while writing to the resource stream.

Related

Is there a way to get the full path (or even just the file name) of a PdfDocument object using itext7 and C#?

I have a list of PdfDocument objects that were split from one large original PDF file by page number (one doc for each page in the original).
All I want to do is iterate those documents
foreach (PdfDocument doc in splitDocuments)
And get the file path/doc name of each, but I have not been able to find a property that gives me that basic info. I need to copy the files to a different location based on some info inside the file, but so far I have not been able to get the file path.
Thanks for any help!
In general there is no unique file name for a PdfDocument.
First of all recall that a PdfDocument can be created based on a PdfReader, a PdfWriter, or a PdfReader / PdfWriter pair. Both the reader and the writer may be file based. Thus, in general it is not clear what you mean by "the" file name, there may be two. But both may also be based on a non-file object, then there is no file name.
If the PdfWriter is file based, you can determine the associated file name from the underlying FileStream, for a PdfDocument doc that is
doc.GetWriter().GetOutputStream()
For the PdfReader the equivalent is not easy (if possible at all): Depending on the options used the file might have been read into a byte array and then forgotten. But even otherwise, i.e. if the reader still is based on the file stream, it is hidden under two or three layers of internal or private information. Using reflection you can work through those layers but that of course wouldn't be appropriate for production purposes.
Also you mention those PdfDocument objects were split from one large original PDF. If that happened by means of the PdfSplitter utility class, those documents don't have a PdfReader to start with.

iTextSharp Mixed Alignments in Same Object

I want to have a function that returns one object, with this object containing two paragraph with different alignments. This is easy to do manually by making them separate paragraphs and adding them to the pdf one at a time, but I would like my function to return it as a whole object to be added to a pdf. Is this possible? As an example of what I want:
someTextHere
someMoreTextHere
But as one object which I can then add to a pdf.
I have created a small standalone iText 7 example that creates the following output:
Th PDF file shown in the screen shot was created like this:
public void createPdf(String dest) throws IOException {
PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
Document document = new Document(pdf);
Div div = new Div()
.add(new Paragraph("Left").setTextAlignment(TextAlignment.LEFT))
.add(new Paragraph("Right").setTextAlignment(TextAlignment.RIGHT))
.setBackgroundColor(ColorConstants.GRAY)
.setWidth(200);
document.add(div);
document.close();
}
As you can see, I created a Div element (similar to a <div> tag in HTML) to which I added two Paragraph objects with a different text alignment. That seems to be exactly what you need.
I am not a C# developer, hence I provide the code in Java. However, if you're proficient in C#, you shouldn't have any problem porting it from Java to C# (it's just a matter of changing lowercases into uppercases, such as changing add() into Add()).
Note that this is iText 7 code; if you're still using iText 5, you should consider upgrading to the latest iText version since iText 5 has gone into maintenance mode a while ago. Maintenance mode means that development on that version has stopped; it's no longer supported for users who aren't a customer.

How to fill forms like this using iText for .NET

Trying to fill name and address on each boxes using
cb.SetTextMatrix(x, y);// x and y positions .
cb.ShowText("data");
But fails to do so .
the problem
The code you are using isn't entirely incorrect, but it has several flaws. For starters: you don't know the value of the x and y parameters, and that's kind of crucial if you want the text to be in the correct position.
Also: you are writing PDF syntax directly into the content stream. In your snippet, you forgot to create the text object (with cb.BeginText() and cb.EndText()). If you are new at PDF, you shouldn't try writing PDF syntax directly into the content stream unless you have a solid understanding of ISO-32000-1. Have you ever read ISO-32000-1? If not, then why are you using low-level operations? That doesn't make much sense, does it? There are helper classes such as ColumnText to add content at absolute positions.
Looking at the screen shot you shared, I see that some fields require "Comb" functionality. This functionality makes sure that each small box contains exactly one glyph (if you don't know what a glyph is, think of it as the visual representation of a character).
If you want to make it easy on yourself, you should test if the form is interactive first. Answer this question:
Does the form contain AcroFields?
If the answer is "Yes", fill out the form using the AcroFields object. You can find out which field names to use by following the instructions in the answer to this question: How do I enumerate all the fields in a PDF file in ITextSharp ?
If the answer is "No", open the file in Adobe Acrobat, and manually add fields. Define the fields as Comb fields so that each box contains each glyph. To get a nice-looking result, select a monospaced font such as Courier (using a proportional font will probably give you an uglier result). This operation adds AcroForm fields.
Once you have an interactive form with AcroFields (assuming you have defined them correctly), filling out the form is as easy as this in iText 5:
PdfReader reader = new PdfReader(template);
PdfStamper stamper = new PdfStamper(reader,
new FileStream(newFile, FileMode.Create));
AcroFields form = stamper.AcroFields;
form.SetField(key1, value1);
form.SetField(key2, value2);
form.SetField(key3, value3);
...
stamper.Close();
See How to create and fill out a PDF form
However, since you are new at all of this, I recommend that you use iText 7 as described in the jump-start tutorial:
PdfDocument pdf = new PdfDocument(new PdfReader(src), new PdfWriter(dest));
PdfAcroForm form = PdfAcroForm.GetAcroForm(pdf, true);
IDictionary<String, PdfFormField> fields = form.GetFormFields();
PdfFormField toSet;
fields.TryGetValue(key1, out toSet);
toSet.SetValue(value1);
fields.TryGetValue(key2, out toSet);
toSet.SetValue(value2);
...
pdf.Close();
If you want to remove the interactivity after filling out the form, you need to flatten the fields. This is also documented on the official web site.

How can I parse through a table in a pdf file?

I have a custom table with name, firstname, place of birth and place of living in a PDF file which I want to parse through in C#. One of the simplest way of doing it would be:
using (PdfLoadedDocument document = new PdfLoadedDocument("foobar"))
{
for (var i = 0; i < document.Pages.Count; i++)
{
Console.WriteLine($"============ PAGE NO. {i+1} ============");
Console.WriteLine(document.Pages[i].ExtractText());
}
}
But the problem is the output:
============ PAGE NO. 38 ============
John L.SmithSan Francisco5400 Baden
There's no way I can seperate this with a regex so I need a way to parse through each column of each row in order to get all the values of the customers separated. How can I parse through a table in a pdf file with syncfusion?
You will need a methods that returns you the coordinate of each character found in the pdf. Then you have some math to do (basically to compute the distance between characters) in order to know if the character is part of a word and where the word itself is located along the x-axe. It requires quite a lot of work and efforts and I didn't find such a method in syncfusion documentation.
I wrote a class which do what you want but this is for java project:
PDFLayoutTextStripper (upon PDFBox)
Syncfusion control extracting the text from PDF document based on the structure of content present in the PDF document. So, based on current implementation of Syncfusion control we cannot recognize the rows and columns present in the table of the PDF document.
Also, it is not possible to extract the text in correct order as same as the PDF document displayed using Syncfusion control since the content present in the PDF document follows fixed layout.
But we can populate the table of the PDF document in Excel using Tabula (Open source library). I have modified the Tabula java (Open Source) to achieve layout based text extraction from the PDF document based on your requirement.
Please find the sample for this implementation in below link:
http://www.syncfusion.com/downloads/support/directtrac/171585/ze/TextExtractionSample649531336
Kindly ensure the following things before executing the sample:
Install Java Runtime Environment (JRE) from the below link.
http://www.oracle.com/technetwork/java/javase/downloads/
Restart your machine.
Execute the above sample.
Try this and check whether it meets your requirement.

OpenXML: Anyway to see if a Word Document fits one page

While I doubt it, if I open up a word document using OpenXML sdk in C# and add some info, is there any way for me to see if it still fits one page?
If it doesn't I wan't to reduce font size on specific items I added until it fits.
I could write this algorithm if I had the current size in relation to page size with margins and all that.
I ran across this example on another site, don't know if it'll work in your case, as it requires the Office PIA...
var app = new Word.Application();
var doc = app.Documents.Open("path/to/file");
doc.Repaginate()
var pageNumber = doc.BuiltInDocumentProperties("Number of Pages").Value as int;

Categories

Resources