I see more and more open source libraries using .NET 5's Source Generators which improves their performance.
As much as I can understand from the docs, they are meant to replace System.Reflection becomes it comes at the expense of performance. Is that true? What I personally know about source generators is that when they introduced them in .NET 5, they were meant to generate C# code based on the .proto data contract files.
There is a clone library of MediatR which uses Source Generators instead of System.Reflection.
Could you simplify the source generators benefits and usage in that MediatR library and overall?
Source generators are really nothing magic - it's just some custom piece of code that generates text into files, which then get inserted into the compilation process and becomes part of the binary output (e.g. DLL or EXE) as if it was manually typed in by hand in some source file before you hit compile.
The only "magic" here is the formalized concept of analyzers-as-generators, which enables Visual Studio to automatically pass in the original source code into your custom generator routine and include the output whenever you build your project.
One application of source generators is to create specialized, type-specific code for some operation that otherwise would require run-time reflection. Run-time reflection is typically rather slow and CPU intensive, yet often needed in order to centralize logic for common operations on unknown objects. A common example is the serialization and deserialization of objects. This could be done either through reflection (lookup properties at run-time, invoke the getters and setters, etc) or through faster, type specific code that directly references properties and reads/writes to and from data streams. However, creating such specialized code for every type is a lot of boring, repetitive work and so - enter source generators. They can do the "reflection" during build time and output slim, fast code into temporary .cs files, which get compiled with the product.
Microsoft.CSharp is required to use dynamic feature.
I understand there are binders, evaluators and helpers in the assembly.
But why it has to be language-specific?
Why Microsoft.CSharp and not Microsoft.Dynamic or System.Dynamic?
Please, explain.
Let's say we have d.x where d is dynamic.
C# compiler
1. applies C# language rules
2. gets "property or field access"
3. emits (figurally) Binder.GetPropertyOrField(d, "x")
Now, being asked to reference Microsoft.CSharp may make one think that language-agnostic binder can't handle this case, and C#-only something got its way through compilation and requires special library.
Compiler had a bad day?
To your first question, it is language-specific because it needs to be.
In C# you call a method with too many arguments and you get an error. In Javascript, the extra arguments are simply ignored. In C# you access a member that doesn't exist and get an error, while in Javascript you get undefined. Even if you discovered all these varying feature sets and put it all into System.Core, the next language fad of the month is sure to have some super neat feature that it wouldn't support. It's better to be flexible.
There is common code in .NET core, under the System.Dynamic and System.Runtime.CompilerServices namespaces. It just can't all be common.
And as for your second question, the need for the "special C# library" could of course be removed by transforming these language-specific behaviors inline, but why? That will needlessly bloat your IL code size. It is the same reasoning for you not writing your own Int32.Parse every time you need to read in a number.
One reason I can think of - Visual Basic.NET has had late binding in it from day one, primarily oriented around how it interoperates with COM IDispatch interfaces - so if they wanted a language agnostic binder, they'd have had to adopt the Visual Basic rules - which includes that member lookup only works with Public members.
Apparently, the C# designers didn't want to be so strict. You can call this class' DoStuff method from C# via a dynamic reference:
public class Class1
{
internal void DoStuff()
{
Console.WriteLine("Hello");
}
}
Whereas attempting to call the same via Visual Basic's Object results in a MissingMemberException at runtime.
So because the C# designers weren't the first to arrive at the late-binding party, they could either follow Visual Basic's lead or they could say "each language will have its own rules" - they went with the latter.
I am reading the book "CLR Via C#" and in the Generics chapter is said:
Source code protecton
The developer using a generic algorithm doesn't need to have access to the algorithm's source code. With C++ templates or Java's generics, however, the algorithm's source code must be available to the developer who is using the algorithm.
Can anyone explain what exactly is meant by this?
Well, Generic classes are distributed in compiled form, unlike C++, where templates need to be distributed in full source code. So you do not need to distribute the C# source code of a library that contains generic classes.
This does not prevent the Receiver of your class from disassembling it though (as it is compiled to IL which can be rather easily decompiled again). To really protect the code, additional methods, such as obfuscation are required.
Behind the scene: This distribution in compiled form is the reason why C# generics and C++ templates also differ in the way they need to be written. C# generic classes and their methods need to be fully defined at the time of compilation, and any error in the definition of the generic class or their methods or any operation on a template parameter which cannot be deduced at compile time will directly produce a compile error. In C++ the template is only compiled at the time of usage and only the methods actually used are compiled. If you have an undefined operation or even a syntactical error in a template definition, you will only see the error when that function is actually instantiated and used.
Is there any built-in way to use Roslyn to perform the same compile-time transformations that the C# compiler does, e.g. for transforming iterators, initializers, lambdas, LINQ, etc. into basic C# code?
The Roslyn compiler API is designed to (in addition to translating source code to IL) let you build source code analysis and transformations tools.
However, lambdas and iterators do not have translations that can always be specified using source. They are modeled using the internal bound node abstraction that includes additional compiler specific rules that can only be represented using IL.
It would be possible to translated LINQ to source in C#, since it is specified as a source code translation (whether the compiler actually does it that way or not.) Yet, there is no compiler API that does this specifically. If there was, it would probably show up as a services layer API and not a compiler API.
AFAIK, no, there is no such thing exposed in Roslyn. But the compiler has to do these transformations somehow, so it's possible you will be able to do this by accessing some internal method.
Of course, you could use Roslyn to make these transformations yourself, but that's not what you're asking.
We would like to parse expressions of the type:
Func<T1, bool>, Func<T1, T2, bool>, Func<T1, T2, T3, bool>, etc.
I understand that it is relatively easy to build an expression tree and evaluate it, but I would like to get around the overhead of doing a Compile on the expression tree.
Is there any off the shelf component which can do this?
Is there any component which can parse C# expressions from a string and evaluate them? (Expression services for C# , I think there is something like this available for VB which is used by WF4)
Edit:
We have specific models which on which we need to evaluate expressions which are entered by IT Administrators.
public class SiteModel
{
public int NumberOfUsers {get;set;}
public int AvailableLicenses {get;set;}
}
We would like for them to enter an expression like:
Site.NumberOfUsers > 100 && Site.AvailableLicenses < Site.NumberOfUsers
We would then like to generate a Func which can be evaluated by passing a SiteModel object.
Func<SiteModel, bool> (Site) => Site.NumberOfUsers > 100 && Site.AvailableLicenses < Site.NumberOfUsers
Also, the performance should not be miserable (but around 80-100 calls per second on a normal PC should be fine).
Mono.CSharp can evaluate expressions from strings, and is very simple to use. The required references come with the mono compiler and runtime. (In the tools directory iirc).
You need to reference Mono.CSharp.dll and the Mono C# compiler executable (mcs.exe).
Next set up the evaluator to know about your code if necessary.
using Mono.CSharp;
...
Evaluator.ReferenceAssembly (Assembly.GetExecutingAssembly ());
Evaluator.Run ("using Foo.Bar;");
Then evaluating expressions is as simple as calling Evaluate.
var x = (bool) Evaluator.Evaluate ("0 == 1");
Maybe ILCalc (on codeplex) does what you are looking for. It comes as a .NET and a Silverlight version and is open sourced.
We have been using it successfully for quite a while. It even allows you to reference variables in your expression.
The "component" you are talking about:
Needs to understand C# syntax (for parsing your input string)
Needs to understand C# semantics (where to perform implicit int->double conversions, etc.)
Needs to generate IL code
Such a "component" is called a C# compiler.
The current Microsoft C# compiler is poor option as it runs in a separate process (thus increasing compilation time as all the metadata needs to be loaded into that process) and can only compile full assemblies (and .NET assemblies cannot be unloaded without unloading the whole AppDomain, thus leaking memory). However, if you can live with those restrictions, it's an easy solution - see sgorozco's answer.
The future Microsoft C# compiler (Roslyn project) will be able to do what you want, but that is still some time in the future - my guess is that it will be released with the next VS after VS11, i.e. with C# 6.0.
Mono C# compiler (see Mark H's answer) can do what you want, but I don't know if that supports code unloading or will also leak a bit of memory.
Roll your own. You know which subset of C# you need to support, and there are separate components available for the various "needs" above. For example, NRefactory 5 can parse C# code and analyze semantics. Expression Trees greatly simplify IL code generation. You could write a converter from NRefactory ResolveResults to Expression Trees, that would likely solve your problem in less than 300 lines of code. However, NRefactory reuses large parts of the Mono C# compiler in its parser - and if you're taking that big dependency, you might as well go with option 3.
Perhaps this technique is useful to you - especially regarding the dependency reviews as you are depending solely on framework components.
EDIT: as pinpointed by #Asti, this technique creates dynamic assemblies that unfortunately, due to limitations of .net Framework design, cannot be unloaded, so careful consideration should be done before using it. This means that if a script is updated, the old assembly containing the previous version of the script can't be unloaded from memory and will be lingering until the application or service hosting it is restarted.
In a scenario where the frequency of change in scripts is reduced, and where compiled scripts are cached and reused and not recompiled on every use, this memory leak can be IMO safely tolerated (this has been the case for all our uses of this technique). Fortunately, in my experience, the memory footprint of the generated assemblies for typical scripts tends to be quite small.
If this is not acceptable, then the scripts can be compiled on a separate AppDomain that can be removed from memory, although, this would require call marshaling between domains (e.g. a named pipe WCF service), or perhaps an IIS hosted service, where unloading occurs automatically after an inactivity period, or a memory footprint threshold is exceeded).
End EDIT
First, you need to add to your project a reference to Microsoft.CSharp, and add the following using statements
using System.CodeDom.Compiler; // this is included in System.Dll assembly
using Microsoft.CSharp;
Then, I'm adding the following method:
private void TestDynCompile() {
// the code you want to dynamically compile, as a string
string code = #"
using System;
namespace DynCode {
public class TestClass {
public string MyMsg(string name) {
//---- this would be code your users provide
return string.Format(""Hello {0}!"", name);
//-----
}
}
}";
// obtain a reference to a CSharp compiler
var provider = CodeDomProvider.CreateProvider("CSharp");
// Crate instance for compilation parameters
var cp = new CompilerParameters();
// Add assembly dependencies
cp.ReferencedAssemblies.Add("System.dll");
// hold compiled assembly in memory, don't produce an output file
cp.GenerateInMemory = true;
cp.GenerateExecutable = false;
// don't produce debugging information
cp.IncludeDebugInformation = false;
// Compile source code
var rslts = provider.CompileAssemblyFromSource(cp, code);
if( rslts.Errors.Count == 0 ) {
// No errors in compilation, obtain type for DynCode.TestClass
var type = rslts.CompiledAssembly.GetType("DynCode.TestClass");
// Create an instance for the dynamically compiled class
dynamic instance = Activator.CreateInstance(type);
// Invoke dynamic code
MessageBox.Show(instance.MyMsg("Gerardo")); // Hello Gerardo! is diplayed =)
}
}
As you can see, you need to add boilerplate code like a wrapper class definition, inject assembly dependencies, etc.), but this is a really powerful technique that adds scripting capabilities with full C# syntax and executes almost as fast as static code. (Invocation will be a little bit slower).
Assembly dependencies can refer to your own project dependencies, so classes and types defined in your project can be refered and used inside the dynamic code.
Hope this helps!
Not sure about the performance part but this seems like a good match for dynamic linq...
Generate xsd out of SiteModel class, then through web/whatever-UI let the administrator input the expression, transform the input via xsl where you modify the expression as a functor literal, then generate and execute it via CodeDom on the fly.
Maybe you can use LUA Scripts as input. The user enters a LUA expression and you can parse and execute it with the LUA engine. If needed you can wrap the input with some other LUA code before you interpret it and I'm not sure about the performance. But 100 calls/s are not that much.
Evaluating expressions is always a security issue. So take care of that, too.
You can use LUA in c#
Another way would be to compile some C# code that contains the input expression in a class. But here you will end up with one assembly per request. I think .net 4.0 can unload assemblies but older versions of .net can't. so this solution might not scale well. A workaround can be an own process that is restarted every X requests.
Thanks for your answers.
Introducing a dependency on Mono in a product like ours (which has more than 100K installations and has a long release cycle of 1-1.5 years) may not be a good option for us. This might also be an overkill since we only need to support simple expressions (with little or no nested expressions) and not an entire language.
After using the code dom compiler, we noticed that it causes the application to leak memory. Although we could load it in a separate app domain to work around this, this again might be an overkill.
The dynamic LINQ expression tree sample provided as part of the VS Samples has a lot of bugs and no support for type conversions when ding comparisons (changing a string to an int, a double to an int, a long to an int, etc). The parsing for indexers also seems to be broken. Although not usable off the shelf, it shows promise for our use cases.
We have decided to go with expression trees as of now.