Load an IFilter without the Registry - c#

so I've been told I might not have access to the registry or programs with which usually load their IFilters onto the system, so I have to include the IFilter dlls in the application and load them directly from there. I'm currently using CodeProject's C# IFilter classes, but their are still a few things that are over my head when it comes to the filterPersistClass, persistentHandlerClass and COM and as such I am a bit lost on how I could get this to work.
I've done all the mundane stuff like, get the dlls, setup a resource file with "Extension, DLL Path" and that, but just can't seem to get a grasp on how to now load the IFilter DLL. It's maybe that I should just start from scratch, but thought I would ask for some help first.
EDIT (Partial Solution)
Well I figured out how to load query.dll using the code below in the FilterReader constructor in FilterReader.cs, though I'm having problems now loading the PDFFilter.dll file and am getting the following error:
Unable to find an entry point named 'LoadIFilter' in DLL 'C:\Program Files\Adobe\Adobe PDF iFilter 9 for 64-bit platforms\bin\PDFFilter.dll'
The problem I think I am now stuck at is that PDFFilter.dll uses STA and C# applications are MTA.
[DllImport("query.dll", SetLastError = true, CharSet = CharSet.Unicode)]
static extern int LoadIFilter(string pwcsPath, [MarshalAs(UnmanagedType.IUnknown)] ref object pUnkOuter, ref IFilter ppIUnk);
// --------------------------- constructor ----------------------------------
var isFilter = false;
object iUnknown = null;
LoadIFilter(fileName, ref iUnknown, ref _filter);
var persistFile = (_filter as IPersistFile);
if (persistFile != null)
{
persistFile.Load(fileName, 0);
IFILTER_FLAGS flags;
IFILTER_INIT iflags =
IFILTER_INIT.CANON_HYPHENS |
IFILTER_INIT.CANON_PARAGRAPHS |
IFILTER_INIT.CANON_SPACES |
IFILTER_INIT.APPLY_INDEX_ATTRIBUTES |
IFILTER_INIT.HARD_LINE_BREAKS |
IFILTER_INIT.FILTER_OWNED_VALUE_OK;
if (_filter.Init(iflags, 0, IntPtr.Zero, out flags) == IFilterReturnCode.S_OK)
isFilter = true;
}
if (_filter != null && isFilter) return;
if (_filter != null) Marshal.ReleaseComObject(_filter);

There is nothing magical about IFilter objects. They are housed in standard COM dlls. In the end, all you need the clsid of the class which knows how to process pdf files.
The LoadIFilter function in query.dll is just a convenient helper function. Everything it does you can do yourself.
There is a standard way, in the registry, in which a file extension (e.g. .pdf) is resolved to a clsid (e.g. {E8978DA6-047F-4E3D-9C78-CDBE46041603})
Note: You could also just skip to the end, and know that the clsid of Adobe's IFilter implementation is {E8978DA6-047F-4E3D-9C78-CDBE46041603}. But that's not guaranteed, so you need to crawl the registry.
The algorithm to resolve an .ext to the clsid of an object that implements IFilter is:
GetIFilterClassIDForFileExtension(String extension)
arguments:
extension (String) e.g. ".pdf"
returns:
clsid (Guid) e.g.
//Get the Persistent Handler for this extension
//e.g.
// HKLM\Software\Classes\.pdf\PersistentHandler\(Default)
//returns
// "{F6594A6D-D57F-4EFD-B2C3-DCD9779E382E}"
persistentHandlerGuid = HKLM\Software\Classes\.pdf\PersistentHandler\(Default)
//Get the clsid associated with this persistent handler
//e.g.
// HKLM\Software\Classes\CLSID\{F6594A6D-D57F-4EFD-B2C3-DCD9779E382E}\PersistentAddinsRegistered\{89BCB740-6119-101A-BCB7-00DD010655AF}
//where the final guid is the interface identifier (IID) of IFilter
clsid = HKLM\persistentHandlerGuid\PersistentAddinsRegistered\{89BCB740-6119-101A-BCB7-00DD010655AF}
//e.g. returns "{E8978DA6-047F-4E3D-9C78-CDBE46041603}", the clsid of Adobe's PDF IFilter
return clsid
Once you have the clsid of the appropriate object, you create it with:
Guid clsid = GetIFilterClassForFileExtension(".pdf")
IFilter filter = CreateComObject(clsid);
You now have the entire guts of the LoadIFilter function from query.dll:
IFilter LoadIFilter(String filename)
{
String extension = ExtractFileExt(filename); //e.g. "foo.pdf" --> ".pdf"
Guid clsid = GetIFilterClassForFileExtension(extension);
return CreateComObject(clsid) as IFilter;
}
Now, all that still requires the registry, because you still have to be able to resolve an extension into a clsid. If you already know the classid, then you don't need the registry:
IFilter adobeIFilterForPdfs = CreateComObject("{E8978DA6-047F-4E3D-9C78-CDBE46041603}")
And you're good to go.
The important point is that the function you're trying to call, LoadIFilter is not inside Adobe's dll (or any other IFilter dll provided by any other company, to crawl any other file types). The LoadIFilter function is exported by query.dll, and is simply a helper function for the above steps i described.
All IFilter dlls are COM dlls. The documented way to load a COM dll is through the CoCreateInstance function:
IUnknown CreateComObject(Guid ClassID)
{
IUnknown unk;
HRESULT hr = CoCreateInstance(ClassID, null, CLSCTX_INPROC_SERVER | CLSCTX_LOCAL_SERVER, IUnknown, ref unk);
if (Failed(hr))
throw new Exception("Could not create instance: "+hr);
return unk;
}
I'll leave it to you to find the correct way to create a COM object from C# managed code. I've forgotten.
Note: Any code released into public domain. No attribution required.

Related

Accessing an ITypeInfo that references an ITypeInfo from an importlib-ed unregistered type library causes TYPE_E_CANTLOADLIBRARY error

I am using the Automation API from .NET (System.Runtime.InteropServices.ComTypes) to inspect a type library Bar.tlb that I generated myself (see below). This type library declares an interface IBar which inherits from interface IFoo which is defined in an imported type library Foo.tlb. Inspecting the ITypeInfo representing IBar causes an exception. (Code follows below.)
Before I get to my code, here's how I generated the Bar.tlb type library.
Bar.idl:
[uuid(32E81FDD-BCB0-481B-AD3C-3ED04BFA7D1F)]
library Bar
{
importlib("Foo.tlb");
[uuid(CF062BE8-86D2-4D9B-8D1D-D889A77DA876)]
interface IBar : IFoo { };
}
Foo.idl:
[uuid(22E81FDD-BCB0-481B-AD3C-3ED04BFA7D1E)]
library Foo
{
importlib("stdole32.tlb");
[uuid(BF062BE8-86D2-4D9B-8D1D-D889A77DA875)]
interface IFoo : IUnknown { };
}
I compiled both IDL files using the following commands, which succeeded without any errors or warnings:
midl.exe /mktyplib203 /env win32 /i … /tlb Foo.tlb Foo.idl
midl.exe /mktyplib203 /env win32 /i … /tlb Bar.tlb Bar.idl
Now what I am trying to do is this:
using System.Runtime.InteropServices;
using ITypeLib = System.Runtime.InteropServices.ComTypes.ITypeLib;
using ITypeInfo = System.Runtime.InteropServices.ComTypes.ITypeInfo;
static class Program
{
[DllImport("oleaut32.dll", CharSet = CharSet.Unicode, PreserveSig = false)]
static extern ITypeLib LoadTypeLibEx(string path, REGKIND regkind);
enum REGKIND { REGKIND_NONE = 2 }
public static void Main()
{
ITypeLib typeLib = LoadTypeLibEx(#"C:\Path\To\Bar.tlb", REGKIND.REGKIND_NONE);
ITypeInfo typeInfo;
typeLib.GetTypeInfo(0, out typeInfo);
IntPtr typeAttrPtr;
typeInfo.GetTypeAttr(out typeAttrPtr); //! COMException: TYPE_E_CANTLOADLIBRARY
… // (HRESULT 0x80029c4a)
}
}
The exception is thrown on the line marked with //!. The inspected ITypeInfo is the one for the IBar interface.
I understand that the Automation API must have trouble locating the inherited interface IFoo, which is contained in a different type library that is also not registered.
But apparently it should be possible to inspect Bar.tlb anyway. OleView.exe manages just fine:
(Yes, it gives a warning about not being able to reconstruct the external type library's filename, which is because I did not register Foo.tlb. That's not what I am worried about.)
If OleView.exe can inspect IBar without crashing, why does my code crash for something as simple as typeInfo.GetTypeAttr()? How do I fix this?
TL;DR: The Automation API appears to access type libraries in the current working directory behind the scenes in order to resolve ITypeInfo references. The issue can be resolved by doing this:
// using static System.IO.Directory;
SetCurrentDirectory(#"C:\Path\To");
ITypeLib typeLib = LoadTypeLibEx("Bar.tlb", REGKIND.REGKIND_NONE);
instead of:
ITypeLib typeLib = LoadTypeLibEx(#"C:\Path\To\Bar.tlb", REGKIND.REGKIND_NONE);
I experimented some more and found out that OleView.exe can only successfully inspect IBar type details in Bar.tlb if Foo.tlb is present in the same directory. Once I move Foo.tlb off somewhere else, OleView.exe fails just like my own code does:
I wanted to see which API calls OleView.exe used to access type libraries, so I checked:
dumpbin.exe /imports OleView.exe
dumpbin.exe /exports C:\WINDOWS\…\oleaut32.dll
and found that it references LoadTypeLib — not LoadTypeLibEx. The two functions differ in how they do type library registration. So I looked up the MSDN documentation for LoadTypeLib and found this:
"LoadTypeLib will not register the type library if the path of the type library is specified."
Which gave me the idea to rewrite my code as shown at the beginning of this answer.
It's important however to point out that contrary to what the quoted text suggests, it is not the presence or absence of a directory in the path passed to LoadTypeLibEx — it's the call to Directory.SetCurrentDirectory that resolves the error. The following would work as well:
// using static System.IO.Directory;
SetCurrentDirectory(#"C:\Path\To");
ITypeLib typeLib = LoadTypeLibEx("C:\Path\To\Bar.tlb", REGKIND.REGKIND_NONE);

Java JNI call to C# COM fails, when COM is registered without codebase option of regasm

Function calls from Java to C# through JNI-C++/CLI are failing when the C# COM is not registered using regasm with the codebase option. I've built a sample following the instructions in P2: Calling C# from Java with some changes.
Numero uno: C#
Change the C# dll into a COM by creating an interface, IRunner, and making the library assembly COM-visible.
namespace RunnerCOM
{
public interface IRunner
{
String ping();
}
public class Runner:IRunner
{
static void Main(string[] args)
{
}
public Runner() { }
public String ping()
{
return "Alive (C#)";
}
}
}
Numero due: Java
No changes made to the Java section.
Numero tre: C++
This part was changed to create a new instance of the RunnerCOM.Runner class and use that result. Here is a good tutorial on how to call managed code from unmanaged code: http://support.microsoft.com/kb/828736
#include "stdafx.h"
#include "Runner.h"
#pragma once
#using <mscorlib.dll>
#import "RunnerCOM.tlb"
JNIEXPORT jstring JNICALL Java_Runner_ping(JNIEnv *env, jobject obj){
RunnerCOM::IRunnerPtr t = RunnerCOM::IRunnerPtr("RunnerCOM.Runner");
BSTR ping = t->ping();
_bstr_t temp(ping, true);
char cap[128];
for(unsigned int i=0;i<temp.length();i++){
cap[i] = (char)ping[i];
}
return env->NewStringUTF(cap);
}
Now to my questions,
The code above fails with a _com_error exception, Class not registered (0x80040154) unless the codebase option is enabled during regsitration of RunnerCOM.dll, with regasm.exe. Why is this? If the code is not ran from JNI, I tested it as an exe, it works fine. The RunnerCOM.dll is simply found in the working directory.
Type casting _bstr_t temp to char* fails. For example, char *out = (char*) temp; Similar to the issue above, it works fine when it's built and executed as an exe but crashes the JVM when it's a JNI call.
By the way this is what I used to run it as an executable:
int main(){
RunnerCOM::IRunnerPtr t = RunnerCOM::IRunnerPtr("RunnerCOM.Runner");
BSTR ping = t->ping();
_bstr_t temp(ping, false);
printf(temp);
return 0;
}
Codebase creates a Codebase entry in the registry. The Codebase entry specifies the file path for an assembly that is not installed in the global assembly cache, so when you specify the codebase, the system will find the DLL based on the path. If not, it will try to locate the dll in the GAC and current working directory. In JNI, I think the current working directory is not the folder where the DLL is. You can use process explorer to find what is the current working directory, also, you can use process monitor to find out which directories the exe is looking into to find the dll.
The code converting _bstr_t to char*, the char* string cap is not ended with '\0', I think this might cause problem in JNI. Uses the _bstr_t operator (char *), you can obtain a null terminated string from the _bstr_t object. Please check the msdn example for details.
You mentioned C++/CLI, C++/Cli and COM warpper are two different ways to interop with C# code. If you're using C++/CLI as a bridge, you doesn't need to register C# DLL as COM, please see this: Calling .Net Dlls from Java code without using regasm.
If you're using COM, you should call CoInitialize() to init COM first in your code.

Is it possible to use RoGetMetaDataFile from non Windows Store App

After reading Larry Osterman's response on the very same issue I am trying to solve at the moment, I thought I had found the answer to my question.
For the record, the question was : how can I from .Net (non-WinRT) list the types in a WinRT assembly ( mine are .dll files apparently, not .Winmd)
I therefore used the following code snippet :
//note, this wrapper function returns the metadata file name and token
// it immediately releases the importer pointer
static Tuple<string, UInt32> ResolveTypeName(string typename)
{
string path;
object importer = null;
UInt32 token;
try
{
var hr = RoGetMetaDataFile(typename, IntPtr.Zero, out path, out importer, out token);
//TODO: check HR for error
return Tuple.Create(path, token);
}
finally
{
Marshal.ReleaseComObject(importer);
}
}
[DllImport("WinTypes.dll")]
static extern UInt32 RoGetMetaDataFile(
[MarshalAs(UnmanagedType.HString)] string name,
IntPtr metaDataDispenser,
[MarshalAs(UnmanagedType.HString)] out string metaDataFilePath,
[MarshalAs(UnmanagedType.Interface)] out object metaDataImport,
out UInt32 typeDefToken);
( found on https://gist.github.com/2920743)
Unfortunately, I get a non-zero HResult.
I referred to the documentation and found this :
HR_RESULT_FROM_WIN32(ERROR_NO_PACKAGE) The function was called from a
process that is not in a Windows Store app.
Does that mean it is not possible to list the types from .Net (non-WinRT) at all ?
RoGetMetaDataFile is used to load a metadata file from within an app package. It locates the metadata file in which the named type is defined, loads that metadata file, and returns an IMetaDataImport interface pointer that represents that metadata file.
From ordinary .NET code you can call RuntimeEnvironment.GetRuntimeInterfaceAsIntPtr (or GetRuntimeInterfaceAsObject) to get the current runtime's IMetaDataDispenser interface pointer, which can be used to load arbitrary modules for inspection.
From native code, you can call ICLRMetaHost::GetRuntime to load a runtime, then from that object call ICLRRuntimeInfo::GetInterface to get its IMetaDataDispenser interface pointer.
RoGetMetaDataFile can be used from outside the app package, however it will only resolve system windows runtime types.
In order to resolve app specific types, you need to be running with "package identity" - in other words, in the context of a running application.

Error when invoking a method through .NET COM interop

I am trying to use the IOfficeAntiVirus COM interface to invoke a scan using the Microsoft security essentials virus scanner.
I am doing early binding because the documentation says that IOfficeAntiVirus interface inherits from IUknown and does not support IDispatch.
[Guid("56FFCC30-D398-11d0-B2AE-00A0C908FA49"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown)
]
public interface IOfficeAntiVirus
{
void Scan(IntPtr info);
}
[ComImport, Guid("2781761E-28E1-4109-99FE-B9D127C57AFE")]
class SecurityEssentialsAntiVirus
{
}
The parameter for the scan method is a type that comes from this example. That example does the opposite of what I want because it implements the IOfficeAntiVirus interface using a .NET class rather than invoking the scan method. The marshalling and types in the example seem to match the documentation as far as I can tell.
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode)]
public struct MSOAVINFO
{
public int cbsize;
[MarshalAs(UnmanagedType.U4)]
public uint uFlags;
public IntPtr hwnd;
[MarshalAs(UnmanagedType.LPWStr)]
public string pwzFullPath;
[MarshalAs(UnmanagedType.LPWStr)]
public string pwzHostname;
[MarshalAs(UnmanagedType.LPWStr)]
public string pwzOrigURL;
}
This is the code I'm trying to use to invoke the scan method:
var antivirus = (IOfficeAntiVirus)new SecurityEssentialsAntiVirus();
var file = new MSOAVINFO();
file.pwzFullPath = #"test.txt";
IntPtr lParam = Marshal.AllocHGlobal(Marshal.SizeOf(file));
Marshal.StructureToPtr(file, lParam, false);
antivirus.Scan(lParam);
It fails on the call to the scan method. I get an exception that says:
"Attempted to read or write protected memory. This is often an
indication that other memory is corrupt."
I'm running this from a 32 bit command line app on a 64 bit system. I have experimented with running the command line program in both 32 bit and 64 bit with no success.
I am sure that the IOfficeAntiVirus guid is correct because I got it from the documentation. Also trying a random GUID that isn't a COM interface causes an error when I try to cast the object to the interface.
I am sure that the SecurityEssentialsAntiVirus guid implements the IOfficeAntiVirus because when I tried casting a different type of COM object to that interface I got an error.
I think the problem might be that the Scan method on the interface hasn't been correctly declared (declaring a random method on the interface gives the same error). I am not working from documentation for the security essentials assembly so it might not implement the interface in the way I imagine (or at all). Does anyone have any idea how to check that?
I can see a MpOAv.dll file in its directory and that is the same name as the header file in the IOfficeAntiVirus documentation, I know that's not a lot to go on. I can't open that dll up in ole-com object viewer to see what is inside. I get a message saying:
IMoniker::BindToObject failed on the file moniker created from (
"C:\Program Files\Microsoft Security Client\MpOAv.dll" ). Bad
extension for file
MK_E_INVALIDEXTENSION ($800401E6)
It also could be that I'm not passing the struct correctly to the Scan method. I have tried about a hundred variations without any luck including using AllocCoTaskMem() and passing the struct by reference.
I'm really not had much experience doing interop (learnt a heap today trying to figure this out!) so I'd really appreciate a push in the right direction. :)
I'm was also spent a couple of days on the same problem but without any results. The maximum what I was able to do it run check without any errors but even for malware it returns OK status, in my case
So, I continued research and found that Mozilla uses IAttachmentExecute antivirus API for checks. And Save method, of it, doing same job like
Scan of IOfficeAntiVirus
IAttachmentExecute::Save may run virus scanners or other trust
services to validate the file before saving it. Note that these
services can delete or alter the file.
and implementation of it on codeples http://antivirusscanner.codeplex.com/
One thing I notice is that you didn't set cbsize. file.cbsize=Marshal.SizeOf(file);

Get A GUID by Type Name

So I'm trying to do manually what AxImp does (I'm doing it dynamically).
My product is a wide released, sanctioned "add-on" to a third party product. They have an OCX, which I add to my form with a COM reference...however, if the client has (or installs) an updated version of their product my product can no longer load the OCX.
Therefore I'm trying to load their OCX dynamically. I've got everything working except that I need the GUID of one of the interfaces in one of their OCXs. I know what the type name is, and the OCX >is< registered on the system. How can I get an object's GUID just from the type name?
Note, Assembly.LoadFrom() doesn't work because the OCX isn't .NET it's COM.
Since your comment let us know that the GUID is found in the OCX but not registered under HKEY_CLASSES_ROOT, we'll have to read it from the type library:
Call LoadTypeLib or LoadTypeLibEx, passing the path to the .OCX file
Then use the FindName method of the returned object.
Then GetTypeAttr followed by PtrToStructure to get a TYPEATTR structure with the GUID.
Not sure why you can't simply add a COM reference to the DLL to your project.
Visual Studio will automatically add a .NET wrapper to any COM object that is referenced this way.
I had to write something like that to migrate code from COM to C#
class GetGuid
{
[DllImport("oleaut32.dll", CharSet = CharSet.Unicode, ExactSpelling = true)]
static extern int LoadTypeLib(string fileName, out ITypeLib typeLib);
public string SearchRegistry(string dllPath /* or ocx */)
{
var result = string.Empty;
ITypeLib tl;
if (LoadTypeLib(dllPath, out tl) == 0)
{
ITypeInfo tf;
tl.GetTypeInfo(0, out tf);
var ip = IntPtr.Zero;
tl.GetLibAttr(out ip);
if (ip != IntPtr.Zero)
{
var ta = (System.Runtime.InteropServices.ComTypes.TYPELIBATTR)Marshal.PtrToStructure(ip, typeof(System.Runtime.InteropServices.ComTypes.TYPELIBATTR));
result = ta.guid.ToString();
}
}
return result;
}
}

Categories

Resources