How to create an Assembly (CSharpCompilation) from an existing Assembly in Roslyn? - c#

In .NET Framework 4.8, i could use AssemblyBuilder to create a new Assembly based on an existing Assembly, add some further modifications (such as encryption) and save it as a new .dll with AssemblyBuilder.Save().
In .NET 6, they removed the functionality to save an Assembly to file, along with other methods i used.
As an alternative i looked into Roslyn and how to generate Assemblies. The thing is, that i don't have the source code i want to compile into a new Assembly, but only an already compiled Assembly i want to modify.
My use case would look like this:
private void Protect(FileInfo dllSourceFile, FileInfo targetFile) {
Assembly assembly = Assembly.LoadFrom(dllSourceFile.FullName);
CSharpCompilation compilation = CSharpCompilation.Create(assembly);
//...modify compilation, encrypt files
var result = compilation.Emit(targetFile.FullName);
}
Is there any functionality in Roslyn (or .Net 6) that helps me here? Even better if it directly uses AssemblyBuilder instead of Assembly.

Related

Load unreferenced Dll

I'm trying to load System.Data.SqlClient dynamically.
The System.Data.SqlClient Nuget is installed but there is no reference to it in the project.
And I don't know the right path to the nuget directory.
Is there a way to do this dynamically?
Here is my code
internal static Type GetFastType(this string typeName, string assembly)
{
if (string.IsNullOrEmpty(assembly))
throw new Exception("AssemblyName cannot be empty");
if (!assembly.ToLower().EndsWith(".dll"))
assembly += ".dll";
var key = typeName + assembly;
if (CachedStringTypes.ContainsKey(key))
return CachedStringTypes.Get(key);
// Assembly.LoadFrom(assembly) // throw exception as the dll is not found
if (!CachedAssembly.ContainsKey(assembly))
CachedAssembly.Add(assembly, Assembly.LoadFrom(assembly));
return CachedStringTypes.GetOrAdd(key, CachedAssembly.Get(assembly).GetType(typeName, true, true));
}
And here is how I run it
var type ="System.Data.SqlClient.SqlConnection".GetFastType("System.Data.SqlClient");
Required reading:
Read this MSDN article: Best Practices for Assembly Loading
In short:
It looks like you're assuming the System.Data.SqlClient.SqlConnection class always exists inside System.Data.SqlClient.dll.
This is an incorrect assumption:
A NuGet package is not a .NET assembly.
A NuGet package does not map 1:1 with a .NET assembly nor namespaces.
A NuGet package can contain multiple assemblies.
A NuGet package can contain zero assemblies.
A NuGet package can contain assemblies that don't have any types defined in them at all!
They could be assemblies that only contain Resources or other embedded items
They could be assemblies that use Type-Forwarding to redirect types that previously existed in this assembly other assemblies. Only the JIT uses this feature, however, not reflection.
And those "forwarded-to" assemblies don't have to exist in NuGet packages either: they can be "in-box" assemblies built-in to the runtime like mscorlib.dll and System.Data.dll).
They could be stub assemblies that don't provide any types when those types are already provided by the Base Class Library - the NuGet package only exists to provide those types for other platforms.
This is the situation you're dealing with.
A NuGet package can have very different effects based on the project's target (.NET Framework, .NET Standard, .NET Core, etc)
Your code cannot assume that a specific class is located in a specific assembly file - this breaks .NET's notion of backwards-compatibility through type-forwarding.
In your case...
In your case, your code assumes System.Data.SqlClient.SqlConnection exists inside an assembly file named System.Data.SqlClient. This assumption is false in many cases, but true in some cases.
Here is the top-level directory structure of the System.Data.SqlClient NuGet package:
Observe how inside the package there are subdirectories for each supported target (in this case, MonoAndroid10, MonoTouch10, net46, net451, net461, netcoreapp2.1, netstandard1.2, etc). For each of these targets the package provides different assemblies:
When targeting .NET Framework 4.5.1, .NET Framework 4.6 or .NET Framework 4.6.1 the files from the net451, net46 and net461 directories (respectively) will be used. These folders contain a single file named System.Data.SqlClient.dll which does not contain any classes. This is because when you target the .NET Framework 4.x, the System.Data.SqlClient (namespace) types are already provided by the Base Class Library inside System.Data.dll, so there is no need for any additional types. (So if you're building only for .NET Framework 4.x then you don't need the System.Data.SqlClient NuGet package at all.
Here's a screenshot of the insides of that assembly file using the .NET Reflector tool (a tool which lets you see inside and decompile .NET assemblies) if you don't believe me:
When targeting other platforms via .NET Standard (i.e. where System.Data.dll isn't included by default, or when System.Data.dll does not include SqlClient) then the NuGet package will use the netstandard1.2, netstandard1.3, netstandard2.0 directories, which does contain a System.Data.SqlClient.dll that does contain the System.Data.SqlClient namespace with the types that you're after. Here's a screenshot of that assembly:
And other platforms like MonoAndroid, MonoTouch, xamarinios, xamarintvos, etc also have their own specific version of the assembly file (or files!).
But even if you know your program will only run on a single specific platform where a specific NuGet package contains an assembly DLL that contains a specific type - it's still "wrong" because of type-forwarding: https://learn.microsoft.com/en-us/dotnet/framework/app-domains/type-forwarding-in-the-common-language-runtime
While Type-Forwarding means that most programs that reference types in certain assemblies will continue to work fine, it does not apply to reflection-based assembly-loading and type-loading, which is what your code does. Consider this scenario:
A new version of the System.Data.SqlClient NuGet package comes out that now has two assemblies:
System.Data.SqlClient.dll (which is the same as before, except SqlConnection is removed but has a [TypeForwardedTo] attribute set that cites System.Data.SqlClient.SqlConnection.dll).
System.Data.SqlClient.SqlConnection.dll (the SqlConnection class now lives in this assembly).
Your code will now break because it explicitly loads only System.Data.SqlClient.dll and not System.Data.SqlClient.SqlConnection.dll and enumerates those types.
Here be dragons...
Now, assuming you're prepared to disregard all of that advice and still write programs that assume a specific type exists in a specific assembly, then the process is straightforward:
// Persistent state:
Dictionary<String,Assembly> loadedAssemblies = new Dictionary<String,Assembly>();
Dictionary<(String assembly, String typeName),Type> typesByAssemblyAndName = new Dictionary<(String assembly, String typeName),Type>();
// The function:
static Type GetExpectedTypeFromAssemblyFile( String assemblyFileName, String typeName )
{
var t = ( assemblyFileName, typeName );
if( !typesByName.TryGetValue( t, out Type type ) )
{
if( !loadedAssemblies.TryGetValue( assemblyFileName, out Assembly assembly ) )
{
assembly = Assembly.LoadFrom( assemblyFileName );
loadedAssemblies[ assemblyFileName ] = assembly;
}
type = assembly.GetType( typeName ); // throws if the type doesn't exist
typesByName[ t ] = type;
}
return type;
}
// Usage:
static IDbConnection CreateSqlConnection()
{
const String typeName = "System.Data.SqlClient.SqlConnection";
const String assemblyFileName = "System.Data.SqlClient.dll";
Type sqlConnectionType = GetExpectedTypeFromAssemblyFile( assemblyFileName, typeName );
Object sqlConnectionInstance = Activator.CreateInstance( sqlConnectionType ); // Creates an instance of the specified type using that type's default constructor.
return (IDbConnection)sqlConnectionInstance;
}
For anyone that may have hade the same problem, i found the solution
Here is how you could load the right type in the right way
var type = Type.GetType($"{typeName}, {assembly}");
eg.
var type =Type.GetType("System.Data.SqlClient.SqlConnection, System.Data.SqlClient");
This way, it should load the dll dynamicly.
I think you have to provide the full path to LoadFrom(...). You should be aware of the probing path of the application, so just concat that path to the name of the assembly. I don't think is straighforward to load from a path that is not in the probing path unless doing some thricks with the app domain.

CodeDom Reference ActiveX Resource

Very Similar Unanswered Question Here: CodeDom Reference VB6 dll
I've been searching google for the last few days regarding an issue referencing an ActiveX/COM assembly using an in-memory assembly compiled using CodeDom (Visual Basic Provider).
I'll try my best to explain what I'm trying to do and, sorry ahead of time if the technical lingo is off!
Visual Studio Process - Working!
Open Visual Studio.
Create new Windows Forms project (C#).
Right-click References -> Add Reference...
Select the Executable I want (Executable may be important here?)
The operation generates two references.
These classes, methods, etc. become available for us as I want.
CodeDom - Not working!
Pre-Work Steps Taken!
Used TibImp.exe on the Executable to generate the two .dll files (Interop).
Moved the generated .dll files to an accessible location (Desktop!)
CodeDom Code Example
string sCodeLocation = #"C:\Interop\Code.vb";
string[] sCodeContents = { File.ReadAllText(sCodeLocation) };
List<string> ReferenceList = new List<string>();
ReferenceList.Add("System.dll");
ReferenceList.Add(#"C:\Users\xxxxx\Desktop\Interops\CustomActiveX.dll");
string[] References = ReferenceList.ToArray();
CompilerParameters CompilerParams = new CompilerParameters();
CompilerParams.GenerateInMemory = true;
CompilerParams.TreatWarningsAsErrors = false;
CompilerParams.CompilerOptions = "/optimize /platform:x86";
CompilerParams.ReferencedAssemblies.AddRange(References);
VBCodeProvider Provider = new VBCodeProvider();
CompilerResults CompileResult = Provider.CompileAssemblyFromSource(CompilerParams, sCodeContents);
The Exception I'm Receiving
System.IO.FileNotFoundException: 'Could not load file or assembly 'CustomActiveX, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null' or one of its dependencies. The system cannot find the file specified.'
List Of Attempted Resolutions (Based on Google Searches)
Ensure platform is set across the board (x86 in this case).
Use TibImp.exe to create Interop DLL's.
Tried referencing with CompilerParameters.EmbeddedResources (But don't understand it)
Tried referencing CompilerParameters.LinkedResources (But don't understand it)
Possible Workaround (Not tested -- not preferred)
As a workaround, I thought about maybe having my CodeDom referenced snippet of code load the assembly dynamically on its end. However, if Visual Studio can reference the COM / ActiveX object and use it, I want to make CodeDom do the same.

Referencing embedded assemblies from dynamically created classes

I have been using the technique of embedding dlls (embedded resource) into an exe and using the following code to resolve the unknown dlls at runtime.
AppDomain.CurrentDomain.AssemblyResolve += (sender, args) =>
{
String resourceName = "Project.lib." + new AssemblyName(args.Name).Name + ".dll";
using (var stream = Assembly.GetExecutingAssembly().GetManifestResourceStream(resourceName))
{
Byte[] assemblyData = new Byte[stream.Length];
stream.Read(assemblyData, 0, assemblyData.Length);
return Assembly.Load(assemblyData);
}
};
However when I embed the Spark View Engine dll (for example) it falls over. But only in one particular place. Spark itself dynamically generates class's on the fly. These class then reference Spark (using Spark etc). It is at this point I get the following error.
The type 'Spark.Class' is defined in
an assembly that is not referenced.
You must add a reference to the
assembly 'Spark'
I'm pretty sure that this has nothing to do with the Spark view engine but to do with referencing an embedded assembly from within a dynamically generated class.
Update: stacktrace
An Exception has occurred when running
the Project Tasks Message:
Spark.Compiler.BatchCompilerException:
Dynamic view compilation failed.
c:\Users\Adam\AppData\Local\Temp\kdsjyhvu.0.cs(6,14):
error CS0012: The type
'Spark.AbstractSparkView' is defined
in an assembly that is not referenced.
You must add a reference to assembly
'Spark, Version=1.5.0.0,
Culture=neutral,
PublicKeyToken=7f8549eed921a12c' at
Spark.Compiler.BatchCompiler.Compile(Boolean
debug, String languageOrExtension,
String[] sourceCode) at
Spark.Compiler.CSharp.CSharpViewCompiler.CompileView(IEnumerable1
viewTemplates, IEnumerable1
allResources) at
Spark.SparkViewEngine.CreateEntryInternal(SparkViewDescriptor
descriptor, Boolean compile) at
Spark.SparkViewEngine.CreateEntry(SparkViewDescriptor
descriptor) at
Spark.SparkViewEngine.CreateInstance(SparkViewDescriptor
descriptor) at
ProjectTasks.Core.Templater.Populate(String
templateFilePath, Object data) in
\ProjectTasks\Core\Templater.cs:line
33 at
ProjectTasks..Core.EmailTemplates.RenderImpl(String
name, Object data) in
\ProjectTasks\Core\EmailTemplates.cs:line
19 at
ProjectTasks.Tasks.EmailUsersWithIncompleteModules.Run()
in
\ProjectTasks\Tasks\EmailUsersWithIncompleteModules.cs:line
41 at
ProjectTasks.MaintenanceTaskRunner.Run(Boolean
runNow, IMaintenanceTask[] tasks) in
\ProjectTasks\MaintenanceTaskRunner.cs:line
25 at
ProjectTasks.Initialiser.Init(String[]
args) in
\ProjectTasks\Initialiser.cs:line
30
Anyone have any ideas on a resolution if indeed there is one at all?
I guess Spark uses CodeDom for dynamic code generation. CSharpCodeProvider internally generates source code and runs csc.exe to obtain new types. Since csc.exe needs physical files as references then AssemblyResolve trick will not help in this case.
The stack trace strongly suggests that Spark is using System.CodeDom to dynamically generate assemblies. That requires reference assemblies to be files on disk, the C# compiler runs out-of-process. This is normally not a problem because you'd have Spark.dll in the same directory as your EXE.
You cannot make this work.
Fwiw: this technique is horribly wasteful of system resources. You double the amount of memory required for assemblies. It is the expensive kind of memory as well, it cannot be shared between processes and is backed by the paging file instead of the assembly file. You can also buy yourself some serious type identity trouble. .NET already supports deployment in a single file. It is called setup.exe
As others have said, the problem lies with the fact that the CodeDom produces artifacts on disk that it then subsequently needs access to in order to render the views.
Apart from the fact that embedding Spark is a potential memory hog anyway, I believe there's a potential solution to this problem. Given the fact that the problem is caused by dynamic view generation on the fly, why not take advantage of Spark's batch compilation option to generate the dll's for your views as part of your build.
you can use code similar to the following to achieve this:
var factory = new SparkViewFactory(settings)
{
ViewFolder = new FileSystemViewFolder(viewsLocation)
};
// And generate all of the known view/master templates into the target assembly
var batch = new SparkBatchDescriptor(targetPath);
factory.Precompile(batch);
In the end, you should have an output dll which will contain compiled views, and you can then embed that dll the same way you are embedding the main Spark.dll.
Hope that helps
Rob

Resolving Assemblies, the fuzzy way

Here's the setup:
A pure DotNET class library is loaded by an unmanaged desktop application. The Class Library acts as a plugin. This plugin loads little baby plugins of its own (all DotNET Class Libraries), and it does so by reading the dll into memory as a byte-stream, then
Assembly asm = Assembly.Load(COFF_Image);
The problem arises when those little baby plugins have references to other dlls. Since they are loaded via the memory rather than directly from the disk, the framework often cannot find these referenced assemblies and is thus incapable of loading them.
I can add an AssemblyResolver handler to my project and I can see these referenced assemblies drop past. I have a reasonably good idea about where to find these referenced assemblies on the disk, but how can I make sure that the Assmebly I load is the correct one?
In short, how do I reliably go from the System.ResolveEventArgs.Name field to a dll file path, presuming I have a list of all the folders where this dll could be hiding)?
When I have used this in the past we have just compared the file name with the part of the ResolveEventArgs.Name that has the name. If you want to be sure that you are loading the exact same version I suppose you could check if the names match, if they do then load the assembly and then check the assemblies full name against the ResolveEventArgs.Name.
something along these lines:
string name = GetAssemblyName (args); //gets just the name part of the assembly name
foreach (string searchDirectory in m_searchDirectories)
{
string assemblyPath = Path.Combine (executingAssemblyPath, searchDirectory);
assemblyPath = Path.Combine (assemblyPath, name + ".dll");
if (File.Exists (assemblyPath))
{
Assembly assembly = Assembly.LoadFrom (assemblyPath);
if (assembly.FullName == args.Name)
return assembly;
}
}
for completeness:
private string GetAssemblyName (ResolveEventArgs args)
{
String name;
if (args.Name.IndexOf (",") > -1)
{
name = args.Name.Substring (0, args.Name.IndexOf (","));
}
else
{
name = args.Name;
}
return name;
}
The Managed Extensibility Framework (MEF) sounds like something that'll solve all your problems. It can scan folders to locate DLLs, resolve dependencies for any depth and manages plug-in composition in general. Each part (or 'plug-in') just has to declare what it needs and what it provides, and MEF takes care of the wiring. If MEF succeeded in taming VS2010's extensibility beast, then it can handle anything.
I've never had luck with AssemblyResolver. I usually do one of these three:
Require plugins not have external references that are not in the GAC. If they bitch, tell them to ILMerge.
Require plugins to dump all their dlls into a known plugin directory. Load all assemblies in that directory into memory.
Require that plugin dependencies exist in a path that is probed by fusion. You can figure out where the binder is looking for assemblies turn on the fusion log (fuslogvw.exe--don't forget to reboot after turning on logging!).

Embedding assemblies inside another assembly

If you create a class library that uses things from other assemblies, is it possible to embed those other assemblies inside the class library as some kind of resource?
I.e. instead of having MyAssembly.dll, SomeAssembly1.dll and SomeAssembly2.dll sitting on the file system, those other two files get bundled in to MyAssembly.dll and are usable in its code.
I'm also a little confused about why .NET assemblies are .dll files. Didn't this format exist before .NET? Are all .NET assemblies DLLs, but not all DLLs are .NET assemblies? Why do they use the same file format and/or file extension?
ILMerge does merge assemblies, which is nice, but sometimes not quite what you want. For example, when the assembly in question is a strongly-named assembly, and you don't have the key for it, then you cannot do ILMerge without breaking that signature. Which means you have to deploy multiple assemblies.
As an alternative to ilmerge, you can embed one or more assemblies as resources into your exe or DLL. Then, at runtime, when the assemblies are being loaded, you can extract the embedded assembly programmatically, and load and run it. It sounds tricky but there's just a little bit of boilerplate code.
To do it, embed an assembly, just as you would embed any other resource (image, translation file, data, etc). Then, set up an AssemblyResolver that gets called at runtime. It should be set up in the static constructor of the startup class. The code is very simple.
static NameOfStartupClassHere()
{
AppDomain.CurrentDomain.AssemblyResolve += new ResolveEventHandler(Resolver);
}
static System.Reflection.Assembly Resolver(object sender, ResolveEventArgs args)
{
Assembly a1 = Assembly.GetExecutingAssembly();
Stream s = a1.GetManifestResourceStream(args.Name);
byte[] block = new byte[s.Length];
s.Read(block, 0, block.Length);
Assembly a2 = Assembly.Load(block);
return a2;
}
The Name property on the ResolveEventArgs parameter is the name of the assembly to be resolved. This name refers to the resource, not to the filename. If you embed the file named "MyAssembly.dll", and call the embedded resource "Foo", then the name you want here is "Foo". But that would be confusing, so I suggest using the filename of the assembly for the name of the resource. If you have embedded and named your assembly properly, you can just call GetManifestResourceStream() with the assembly name and load the assembly that way. Very simple.
This works with multiple assemblies, just as nicely as with a single embedded assembly.
In a real app you're gonna want better error handling in that routine - like what if there is no stream by the given name? What happens if the Read fails? etc. But that's left for you to do.
In the rest of the application code, you use types from the assembly as normal.
When you build the app, you need to add a reference to the assembly in question, as you would normally. If you use the command-line tools, use the /r option in csc.exe; if you use Visual Studio, you'll need to "Add Reference..." in the popup menu on the project.
At runtime, assembly version-checking and verification works as usual.
The only difference is in distribution. When you deploy or distribute your app, you need not distribute the DLL for the embedded (and referenced) assembly. Just deploy the main assembly; there's no need to distribute the other assemblies because they're embedded into the main DLL or EXE.
Take a look at ILMerge for merging assemblies.
I'm also a little confused about why .NET assemblies are .dll files. Didn't this format exist before .NET?
Yes.
Are all .NET assemblies DLLs,
Either DLLs or EXE normally - but can also be netmodule.
but not all DLLs are .NET assemblies?
Correct.
Why do they use the same file format and/or file extension?
Why should it be any different - it serves the same purpose!
You can embed an assembly (or any file, actually) as a resource (and then use the ResourceManager class to access them), but if you just want to combine assemblies, you're better off using a tool like ILMerge.
EXE and DLL files are Windows portable executables, which are generic enough to accomodate future types of code, including any .NET code (they can also run in DOS but only display a message saying that they're not supposed to run in DOS). They include instructions to fire up the .NET runtime if it isn't already running. It's also possible for a single assembly to span across multiple files, though this is hardly ever the case.
Note ILMerge doesn't work with embedded resources like XAML, so WPF apps etc will need to use Cheeso's method.
There's also the mkbundle utility offered by the Mono project
Why do they use the same file format and/or file extension?
Why should it be any different - it serves the same purpose!
My 2ยข bit of clarification here: DLL is Dynamic Link Library. Both the old style .dll (C-code) and .net style .dll are by definition "dynamic link" libraries. So .dll is a proper description for both.
With respect to Cheeso's answer of embedding the assemblies as resources and loading them dynamically using the Load(byte[]) overload using an AssemblyResolve event handler, you need to modify the resolver to check the AppDomain for an existing instance of the Assembly to load and return the existing assembly instance if it's already loaded.
Assemblies loaded using that overload do not have a context, which can cause the framework to try and reload the assembly multiple times. Without returning an already loaded instance, you can end up with multiple instances of the same assembly code and types that should be equal but won't be, because the framework considers them to be from two different assemblies.
At least one way that multiple AssemblyResolve events will be made for the same assembly loaded into the "No context" is when you have references to types it exposes from multiple assemblies loaded into your AppDomain, as code executes that needs those types resolved.
https://msdn.microsoft.com/en-us/library/dd153782%28v=vs.110%29.aspx
A couple of salient points from the link:
"Other assemblies cannot bind to assemblies that are loaded without context, unless you handle the AppDomain.AssemblyResolve event"
"Loading multiple assemblies with the same identity without context can cause type identity problems similar to those caused by loading assemblies with the same identity into multiple contexts. See Avoid Loading an Assembly into Multiple Contexts."
I would suggest you to try Costura.Fody. Just don't forget to Install-Package Fody before Costura.Fody (in order to get the newest Fody!)

Categories

Resources