OpenXML: Anyway to see if a Word Document fits one page

OpenXML: Anyway to see if a Word Document fits one page - c#

While I doubt it, if I open up a word document using OpenXML sdk in C# and add some info, is there any way for me to see if it still fits one page?
If it doesn't I wan't to reduce font size on specific items I added until it fits.
I could write this algorithm if I had the current size in relation to page size with margins and all that.

I ran across this example on another site, don't know if it'll work in your case, as it requires the Office PIA...
var app = new Word.Application();
var doc = app.Documents.Open("path/to/file");
doc.Repaginate()
var pageNumber = doc.BuiltInDocumentProperties("Number of Pages").Value as int;

Related

Get page number from Word document

I'm using GemBox.Document and I need to find out on what page is my bookmark located inside the Word document. Can this be done?
If not, then can I find out the page on which some specific text is located?
I can find both bookmark and text, but I don't see any option that lets me get the page number from that.
DocumentModel document = DocumentModel.Load("My Document.docx");
Bookmark bookmark = document.Bookmarks["My Bookmark"];
ContentRange content = document.Content.Find("My Text").First();

This is a somewhat uncommon task for Word files, you see these files themselves do not have a page concept, they are of a flow-document type, the page concept is specific to a Word application which is rendering it (like Microsoft Word).
The flow-document types (DOC, DOCX, RTF, HTML, etc. formats) define content in a flow-able manner, it's designed for easier editing.
On the other hand, the fixed-document types (PDF, XPS, etc. formats) have a page concept because the content is fixed, it specifies on which page and on which location some specific content will be rendered, it's designed to be rendered the same when being viewed on any application or any screen.
Nevertheless, here is how you can obtain the page number from some ContentPosition using GemBox.Document:
static int GetPageNumber(ContentPosition position)
{
DocumentModel document = position.Parent.Document;
Field pageField = new Field(document, FieldType.Page);
Field importedPageField = position.InsertRange(pageField.Content).Parent as Field;
document.GetPaginator(new PaginatorOptions() { UpdateFields = true });
int pageNumber = int.Parse(importedPageField.Content.ToString());
importedPageField.Content.Delete();
return pageNumber;
}
Also, here is how you can use it:
DocumentModel document = DocumentModel.Load("My Document.docx");
Bookmark bookmark = document.Bookmarks["My Bookmark"];
ContentRange content = document.Content.Find("My Text").First();
int bookmarkPageNumber = GetPageNumber(bookmark.Start.Content.Start);
int contentPageNumber = GetPageNumber(content.Start);
Last, note that the GetPaginator method is a somewhat heavy task (basically, it is similar to saving the whole document to PDF), it can be expensive when you have a rather large document.
So, if you need to use GetPageNumber multiple times (for example, to find out the page number of each bookmark that you have), then you should consider changing the code so that you first import all the page fields that you need and then call the GetPaginator method just once and then read the content of all those page fields.

iTextSharp Mixed Alignments in Same Object

I want to have a function that returns one object, with this object containing two paragraph with different alignments. This is easy to do manually by making them separate paragraphs and adding them to the pdf one at a time, but I would like my function to return it as a whole object to be added to a pdf. Is this possible? As an example of what I want:
someTextHere
someMoreTextHere
But as one object which I can then add to a pdf.

I have created a small standalone iText 7 example that creates the following output:
Th PDF file shown in the screen shot was created like this:
public void createPdf(String dest) throws IOException {
PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
Document document = new Document(pdf);
Div div = new Div()
.add(new Paragraph("Left").setTextAlignment(TextAlignment.LEFT))
.add(new Paragraph("Right").setTextAlignment(TextAlignment.RIGHT))
.setBackgroundColor(ColorConstants.GRAY)
.setWidth(200);
document.add(div);
document.close();
}
As you can see, I created a Div element (similar to a <div> tag in HTML) to which I added two Paragraph objects with a different text alignment. That seems to be exactly what you need.
I am not a C# developer, hence I provide the code in Java. However, if you're proficient in C#, you shouldn't have any problem porting it from Java to C# (it's just a matter of changing lowercases into uppercases, such as changing add() into Add()).
Note that this is iText 7 code; if you're still using iText 5, you should consider upgrading to the latest iText version since iText 5 has gone into maintenance mode a while ago. Maintenance mode means that development on that version has stopped; it's no longer supported for users who aren't a customer.

How can I parse through a table in a pdf file?

I have a custom table with name, firstname, place of birth and place of living in a PDF file which I want to parse through in C#. One of the simplest way of doing it would be:
using (PdfLoadedDocument document = new PdfLoadedDocument("foobar"))
{
for (var i = 0; i < document.Pages.Count; i++)
{
Console.WriteLine($"============ PAGE NO. {i+1} ============");
Console.WriteLine(document.Pages[i].ExtractText());
}
}
But the problem is the output:
============ PAGE NO. 38 ============
John L.SmithSan Francisco5400 Baden
There's no way I can seperate this with a regex so I need a way to parse through each column of each row in order to get all the values of the customers separated. How can I parse through a table in a pdf file with syncfusion?

You will need a methods that returns you the coordinate of each character found in the pdf. Then you have some math to do (basically to compute the distance between characters) in order to know if the character is part of a word and where the word itself is located along the x-axe. It requires quite a lot of work and efforts and I didn't find such a method in syncfusion documentation.
I wrote a class which do what you want but this is for java project:
PDFLayoutTextStripper (upon PDFBox)

Syncfusion control extracting the text from PDF document based on the structure of content present in the PDF document. So, based on current implementation of Syncfusion control we cannot recognize the rows and columns present in the table of the PDF document.
Also, it is not possible to extract the text in correct order as same as the PDF document displayed using Syncfusion control since the content present in the PDF document follows fixed layout.
But we can populate the table of the PDF document in Excel using Tabula (Open source library). I have modified the Tabula java (Open Source) to achieve layout based text extraction from the PDF document based on your requirement.
Please find the sample for this implementation in below link:
http://www.syncfusion.com/downloads/support/directtrac/171585/ze/TextExtractionSample649531336
Kindly ensure the following things before executing the sample:
Install Java Runtime Environment (JRE) from the below link.
http://www.oracle.com/technetwork/java/javase/downloads/
Restart your machine.
Execute the above sample.
Try this and check whether it meets your requirement.

Prevent Word document's fields from updating when opened

I wrote a utility for another team that recursively goes through folders and converts the Word docs found to PDF by using Word Interop with C#.
The problem we're having is that the documents were created with date fields that update to today's date before they get saved out. I found a method to disable updating fields before printing, but I need to prevent the fields from updating on open.
Is that possible? I'd like to do the fix in C#, but if I have to do a Word macro, I can.

As described in Microsoft's endless maze of documentation you can lock the field code. For example in VBA if I have a single date field in the body in the form of
{DATE \# "M/d/yyyy h:mm:ss am/pm" \* MERGEFORMAT }
I can run
ActiveDocument.Fields(1).Locked = True
Then if I make a change to the document, save, then re-open, the field code will not update.
Example using c# Office Interop:
Word.Application wordApp = new Word.Application();
Word.Document wordDoc = wordApp.ActiveDocument;
wordDoc.Fields.Locked = 1; //its apparently an int32 rather than a bool
You can place the code in the DocumentOpen event. I'm assuming you have an add-in which subscribes to the event. If not, clarify, as that can be a battle on its own.
EDIT: In my testing, locking fields in this manner locks them across all StoryRanges, so there is no need to get the field instances in headers, footers, footnotes, textboxes, ..., etc. This is a surprising treat.

Well, I didn't find a way to do it with Interop, but my company did buy Aspose.Words and I wrote a utility to convert the Word docs to TIFF images. The Aspose tool won't update fields unless you explicitly tell it to. Here's a sample of the code I used with Aspose. Keep in mind, I had a requirement to convert the Word docs to single page TIFF images and I hard-coded many of the options because it was just a utility for myself on this project.
private static bool ConvertWordToTiff(string inputFilePath, string outputFilePath)
{
try
{
Document doc = new Document(inputFilePath);
for (int i = 0; i < doc.PageCount; i++)
{
ImageSaveOptions options = new ImageSaveOptions(SaveFormat.Tiff);
options.PageIndex = i;
options.PageCount = 1;
options.TiffCompression = TiffCompression.Lzw;
options.Resolution = 200;
options.ImageColorMode = ImageColorMode.BlackAndWhite;
var extension = Path.GetExtension(outputFilePath);
var pageNum = String.Format("-{0:000}", (i+1));
var outputPageFilePath = outputFilePath.Replace(extension, pageNum + extension);
doc.Save(outputPageFilePath, options);
}
return true;
}
catch (Exception ex)
{
LogError(ex);
return false;
}
}

I think a new question on SO is appropriate then, because this will require XML processing rather than just Office Interop. If you have both .doc and .docx file types to convert, you might require two separate solutions: one for WordML (Word 2003 XML format), and another for OpenXML (Word 2007/2010/2013 XML format), since you cannot open the old file format and save as the new without the fields updating.
Inspecting the OOXML of a locked field shows us this w:fldLock="1" attribute. This can be inserted using appropriate XML processing against the document, such as through the OOXML SDK, or through a standard XSLT transform.
Might be helpful: this how-do-i-unlock-a-content-control-using-the-openxml-sdk-in-a-word-2010-document question might be similar situation but for Content Controls. You may be able to apply the same solution to Fields, if the the Lock and LockingValues types apply the same way to fields. I am not certain of this however.
To give more confidence that this is the way to do it, see example of this vendor's solution for the problem. If you need to develop this in-house, then openxmldeveloper.org is a good place to start - look for Eric White's examples for manipulating fields such as this.

Print Adobe Illustrator documents

I have one file called test.ai and I need to print it several times, but changing the text inside it each time.
Added the illustrator reference to the project and it is already changing the text inside the image, my problem is to stack up several of these documents and send them to a printer or to the printing dialog.
Here is the code to open the file
//open AI, init
Illustrator.Application illuApp = new Illustrator.Application();
// open doc
Illustrator.Document illuDoc = illuApp.Open("C:\\myai.ai", Illustrator.AiDocumentColorSpace.aiDocumentRGBColor, null);
there is this illuDoc.PrintOut function, it takes one option object as parameter, but I can't seem to find the documentation about it. And don't know if it could help in my situation.
How could I achieve this?
Thanks!
Jonathan

According to the documentation I find here (I assume this is the library that you're using?), the PrintOut function takes PrintOptions as an argument.
PrintOptions collects all information about all printing options including flattening, color management, coordinates, fonts, and paper. Used as an argument to the PrintOut method. (page 184)
You should be able to set up a loop in your code with the number of iterations equal to the number of documents that you want printed, and in the body of that loop, make the change to the text of the document and call the PrintOut function for that document with the appropriate PrintOptions parameters.

Your best bet is to avoid any AI references for direct printing. The storage format for an AI file is nearly identical to a PDF (make a copy and change the extension from .ai to .pdf and be amazed). This opens the door to using any pdf printing method for your Illustrator file.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

OpenXML: Anyway to see if a Word Document fits one page - c#

I ran across this example on another site, don't know if it'll work in your case, as it requires the Office PIA... var app = new Word.Application(); var doc = app.Documents.Open("path/to/file"); doc.Repaginate() var pageNumber = doc.BuiltInDocumentProperties("Number of Pages").Value as int;

Related

Get page number from Word document

iTextSharp Mixed Alignments in Same Object

How can I parse through a table in a pdf file?

Prevent Word document's fields from updating when opened

Print Adobe Illustrator documents

Categories

Resources