Read file details (not language dependent) - c#

In C#, I would like to read file details from a specific file.
I've found an interesting thread: Read/Write 'Extended' file properties (C#)
it uses a call to the GetDetailsOf() method on the folder shell object included in shell32.dll.
It works fine but I have an issue: According to the Operating System language, the header string is never the same...('Name' for the filename property on an english Windows, 'Nom' on a french Windows).
So, it's not easy to retrieve specific values with the name of the property as it changes according to the language...
Is there a way to handle this easily?

Some properties are available through the FileInfo object. For example, if you want the creation time of the file you can do:
Fileinfo myFileInfo = new Fileinfo(#"C:\path\to\file");
DateTime ftime = myFileInfo.CreationTime;

Is the FileInfo class not enough for your needs ?
FileInfo info = new FileInfo("fileName");
var name = info.Name;
var creationTime = info.CreationTime;
// etc ...
If not, tell more about which properties you'd like to read from your file.
Update to my answer :
I don't know about a library that would allow to read any type of document properties'
But here are a few ways for the formats you said,
PDF :
Extracting Additional Metadata from a PDF using iTextSharp
Read/Modify PDF Metadata using iTextSharp
So, iText ® is a library that allows you to create and manipulate PDF documents (from their website)
Office : (first link from MS stipulates that it applies to Word as well as Excel documents)
How to: Read from and Write to Document Properties
Listing properties of a word document in C#

Related

Creating embedded resource in C# class library if it doesn't exist

I have an issue with a class library; I am preparing a library with an interface that represents a specific data storage signature. The purpose is to use the interface as a basis for implementing a number of specific classes storing configuration information in different formats (text files, xml files, etc.) while retaining the same usage profile to the application using it. I have a problem, though. In this case I am trying to embed an xml file as a resource - this file is one type of format to store configuration data. The file is located as an embedded resource in a subfolder to the project, as shown in the attached illustration.
In the following code snippet it is shown how I have implemented the functionality until now.
public ConfigInfoXmlSource()
{
if (!string.IsNullOrEmpty(Settings.Default.CurrentConfigFile))
FileNameAndPath = Settings.Default.CurrentConfigFile;
else
FileNameAndPath = DefaultConfigFileName + DefaultFileExtension;
// Prepare XML.
System.Reflection.Assembly a = Assembly.GetExecutingAssembly();
XmlDocument doc = new XmlDocument();
Stream manifestResourceStream =
a.GetManifestResourceStream("TestTool.Config.Config1.xml");
if (manifestResourceStream == null)
{
// ???
}
...
doc.Load(manifestResourceStream);
...
}
In the section marked "Prepare XML" I am trying to read a stream from the embedded resource. After the reading, it is tested whether a stream was indeed created. If the file is found, the manifestResourceStream will contain the xml data - so far so good. The problem arises if the file for some reason has been accidentally deleted - in that case I want to create a new file as an embedded resource to replace the deleted file. That is supposed to happen in the conditional in the part shown as "???".
I have tried everything I could think of, searched Google for answers, etc. - to no avail.
Does anyone have a clue to how this is accomplished? Any help will be greatly appreciated.
Thanks in advance.
Best regards.
If you have a embedded resource,it is built into your binaries.It is not an physical file,rather something which is present inside the built file(dll in this case).So,once it is included,I do not think it can ever be deleted. As per my knowledge embedded resource can only be set while building your project binaries and you can not explicitly do it at runtime as it is not needed due to reasons mentioned above.

C# OpenXML Word - How to create a VBA macro?

I'm trying to create a macro in a new dotm word file created with OpenXML. I guess I have to add a VBAProjectPart but I can not go on.
The macro is stored in a string variable : for example
string tmpMacro = "Private Sub Add_Pages()\nDim tmpPages As Integer\ntmpPages = Selection.Information(wdNumberOfPagesInDocument)\nSelection.EndKey Unit:= wdStory\nDo While Selection.Information(wdNumberOfPagesInDocument) < 10\nSelection.InsertBreak(wdPageBreak)\nLoop\nEnd Sub";
WordprocessingDocument tmpWD = WordprocessingDocument.Create("myDoc.docm", DocumentFormat.OpenXml.WordprocessingDocumentType.MacroEnabledDocument);
MainDocumentPart tmpWMP = tmpWGD.AddMainDocumentPart();
tmpMDP.Document = new Document(new Body());
tmpWD.Close();
In OpenXML, macros are a combination of binary format and XML relation files.
To verify this for yourself, create a new Word/Excel file, create a new macro, and save it as a macro-enabled document/workbook. Close the file and rename it to end with .zip.
In the main directory, you will find the file [Content_Types].xml, inside of which there are two relation pointers:
<Default Extension="bin"
ContentType="application/vnd.ms-office.vbaProject"/>
<Override PartName="/word/vbaData.xml"
ContentType="application/vnd.ms-word.vbaData+xml"/>
To follow these files, locate word/vbaData.xml, inside of which there will be something like:
<wne:vbaSuppData ...namespaces ommitted... >
<wne:mcds>
<wne:mcd wne:macroName="PROJECT.NEWMACROS.MACRO1"
wne:name="Project.NewMacros.Macro1"
wne:bEncrypt="00"
wne:cmg="56"/>
</wne:mcds>
</wne:vbaSuppData>
This is shows that there is some macro named Project.NewMacros.Macro1, but little else. So let's look inside of word/_rels/document.xml.rels:
<Relationships ...namespaces ommitted...>
<Relationship Id="rId1"
Type="http://schemas.microsoft.com/office/2006/relationships/vbaProject"
Target="vbaProject.bin"/>
...other relationships ommitted...
</Relationships>
This points to word/vbaProject.bin, which is a binary file format.
If you need to add this macro programmatically (e.g. you cannot set everything else up, and add the macro manually), then you could create the macro in a one document manually, and then programmatically copy the binary stream from the manually created vbaProject.bin file into a new vbaProject.bin file.
If you decide to follow the stream copy approach, the answer to this question includes a snippet demonstrating one way to do so.

Reading images from Word files using C# Word API without using Clipboard

I've been working on an application to read images from multiple word files and store them in one single word file using Microsoft.Office.Interop.Word in C#
EDIT: I also need to save a copy of the images on the file system, so I need the image in a Bitmap or similar object.
This is my implementation so far, which works fine:
foreach (InlineShape shape in doc.InlineShapes)
{
shape.Range.Select();
if (shape.Type == WdInlineShapeType.wdInlineShapePicture)
{
doc.ActiveWindow.Selection.Range.CopyAsPicture();
ImageData = Clipboard.GetDataObject();
object _ob1 = ImageData.GetData(DataFormats.Bitmap);
bmp = (Bitmap)_ob1;
images[i++] = bmp;
/*
bmp.Save("C:\\Users\\Akshay\\Pictures\\bitmaps\\test" + i.ToString() + ".bmp");
*/
}
}
I have:
Selected the images as InlineShapes
Copied the shape into Clipboard
Stored the shape in the Clipboard in a DataObject
Extracted the shape from the DataObject in Bitmap format and stored in a Bitmap object.
I've been told to refrain from using Clipboard in Word automation and use the Word APIs instead.
I've read up on it and found an SO answer stating the same.
I looked up many implementations of reading images from Word files on MSDN, SO etc. but could not find any without using clipboard.
How do I read images from Word files using the Word APIs from Microsoft.Office.Interop.Word namespace alone without using Clipboard ?
Word documents in the Office Open XML file format store images in Base64. So it should be possible to extract that information and convert/stream it to a file. You can access the information when the document is open in the Word application using the Range.WordOpenXML property.
string shapeBase64 = shape.Range.WordOpenXML;
This will return the entire Word Open XML in the flat file OPC format. In other words, it won't contain only the picture in Base64, but the entire zip package definition as XML that surrounds it. In my quick test, the tag the contains the actual Base64 is
<pkg:binaryData>
That's a child element of
<pkg:part pkg:name="/word/media/image1.jpg" pkg:contentType="image/jpeg" pkg:compression="store">
Note that it would also be possible for you to get the entire document's WordOpenXML in one step:
document.Content.WordOpenXML
but might then need to understand the way the InlineShapes in the document body are linked to the actual information in the "media" part.
And it would be possible, of course, to work directly with the Zip Package (using the Open XML SDK, perhaps) instead of opening the document in the Word.Application.

Build Word Document from template

I have a request to create a word document on the fly based on a template provided to me. I have done some research and everything seems to point at OpenXML. I have looked into that, but the cs file that gets created is over 15k lines and is breaking my VS 2010 (causing it to not respond every time I make a change).
I have been looking at this tutorial series on Open XML
http://openxmldeveloper.org/blog/b/openxmldeveloper/archive/2011/10/13/getting-started-with-open-xml-development.aspx
I have done things in the past with text files and Regular Expressions, but since Word encrypts everything, that does not work. Are there any other options that are fairly lightweight for creating word documents from templates.
//Hi, It is quite simple.
//First, you should copy your Template file into another location.
string SourcePath = "C:\\MyTemplate.dotx";
string DestPath = "C:\\MyDocument.docx";
System.IO.File.Copy(SourcePath, DestPath);
//After copying the file, you can open a WordprocessingDocument using your Destination Path.
WordprocessingDocument Mydoc = WordprocessingDocument.Open(DestPath, true);
//After openning your document, you can change type of your document after adding additional parts into your document.
mydoc.ChangeDocumentType(WordprocessingDocumentType.Document);
//If you wish, you can edit your document
AttachedTemplate attachedTemplate1 = new AttachedTemplate() { Id = "MyRelationID" };
MainDocumentPart mainPart = mydoc.MainDocumentPart;
MySettingsPart = mainPart.DocumentSettingsPart;
MySettingsPart.Settings.Append(attachedTemplate1);
MySettingsPart.AddExternalRelationship("http://schemas.openxmlformats.org/officeDocument/2006/relationships/attachedTemplate", new Uri(CopyPath, UriKind.Absolute), "MyRelationID");
//Finally you can save your document.
mainPart.Document.Save();
I am currently working on something along these lines and I have been making use of the Open XML SDK and the OpenXmlPowerTools The approach been taken is taking the actual template file opening it up and putting text into various place holders within the template document. I have been using content controls as the place markers.
The SDK tool to open up a document has been invaluable in being able to compare documents and see how it is constructed. However the code generated from the tool I have been refactoring heavily and removing sections that are not being used at all.
I can't talk about doc files but with docx files they are not encrypted they are just zip files that contain xml files
Eric White's blog has a large number of examples and code samples which have been very useful

how to convert pdf file to text file using c#.net

currently i have been using the following code and i am using some dll files from pdfbox
FileInfo file = new FileInfo("c://aa.pdf");
PDDocument doc = PDDocument.load(file.FullName);
PDFTextStripper pdfStripper = new PDFTextStripper();
string text = pdfStripper.getText (doc);
richTextBox1.Text = qq;
using this code i can able to get text file but not in a correct format plz give me a some ideas
Extracting the text from a pdf file is anything but trivial.
To quote from th iTextSharp tutorial.
"The pdf format is just a canvas where
text and graphics are placed without
any structure information. As such
there aren't any 'iText-objects' in a
PDF file. In each page there will
probably be a number of 'Strings', but
you can't reconstruct a phrase or a
paragraph using these strings. There
are probably a number of lines drawn,
but you can't retrieve a Table-object
based on these lines. In short:
parsing the content of a PDF-file is
NOT POSSIBLE with iText."
There are several commercial applications which claim to be able to do it. Caveat Emptor.
There is also a free software library called Poppler http://poppler.freedesktop.org/ which is used by the pdf viewers of GNOME and KDE. It has a function called pdftotext() but I have no experience with it. It may be your best free option.
There is a blog article explaining the issues with PDF text extraction in general at http://pdf.jpedal.org/java-pdf-blog/bid/12670/PDF-text

Categories

Resources