I've been thinking about whether it's possible to apply the DI pattern without incurring the cost of virtual method calls (which according to experiments I did can be up to 4 times slower than non-virtual calls). The first idea I had was to do dependency injection via generics:
sealed class ComponentA<TComponentB, TComponentC> : IComponentA
where TComponentB : IComponentB
where TComponentC : IComponentC
{ ... }
Unfortunately the CLR still does method calls via the interfaces even when concrete implementations of TComponentB and TComponentC are specified as generic type parameters and all of the classes are declared as sealed. The only way to get the CLR to do non-virtual calls is by changing all of the classes to structs (which implement the interfaces). Using struct doesn't really make sense for DI though and makes the issue below even more unsolvable.
The second issue with the above solution is that it can't handle circular references. I can't think of any way, either by C# code, or by constructing expression trees, to handle circular references because that would entail infinitely recursing generic types. (.Net does support generic types referencing itself, but it doesn't seem to generalize to this case.) Since only structs can cause the CLR to bypass the interfaces, I don't think this problem is solvable at all because circular references between structs could cause a paradox.
There's only one other solution I can think of and it's guaranteed to work - emit all of the classes from scratch at runtime, maybe basing them on compiled classes as templates. Not really an ideal solution though.
Anyone have better ideas?
Edit: In regards to most of the comments, I guess I should say that this is filed under "pure intellectual curiosity" I debated whether to ask this because I do realize that I don't have any concrete case in which it's necessary. I was just thinking about it for fun and was wondering whether anyone else came across this before.
Typical example of trying to completely over-engineer something in my opinion. Just don't compromise your design because you can save a few 10's of milliseconds - if it is even that.
Are you seriously suggesting that because of the callvirt instructions, your app ends up being so significantly slower that users (those people you write the app for) will notice any difference - at all? I doubt that very much.
This blog post explains why you can't optimize the virtual call.
While a callvirt instruction does take longer this is usually done because it provides a cheap null check for the CLR prior to making the call to the method. A callvirt shouldn't take significantly longer than a call instruction especially considering the null check.
Have you found that you could significantly improve the performance of your application by creating types (either structs or classes with static methods) that allow you to guarantee that the C# compiler will emit call instructions rather than callvirt instructions?
The reason I ask is that I am wondering if you are going to create an unmaintainable code base that is brittle and hard to use simply to solve a problem that may or may not exist.
Related
It is common to read around that object casting is a bad practice and should be avoided, for instance Why should casting be avoided? question has gotten some answers with great arguments:
By Jerry Coffin:
Looking at things more generally, the situation's pretty simple (at
least IMO): a cast (obviously enough) means you're converting
something from one type to another. When/if you do that, it raises the
question "Why?" If you really want something to be a particular type,
why didn't you define it to be that type to start with? That's not to
say there's never a reason to do such a conversion, but anytime it
happens, it should prompt the question of whether you could re-design
the code so the correct type was used throughout.
By Eric Lippert:
Both kinds of casts are red flags. The first kind of cast
raises the question "why exactly is it that the developer knows
something that the compiler doesn't?" If you are in that situation
then the better thing to do is usually to change the program so that
the compiler does have a handle on reality. Then you don't need the
cast; the analysis is done at compile time.
The second kind of cast raises the question "why isn't the operation
being done in the target data type in the first place?" If you need
a result in ints then why are you holding a double in the first
place? Shouldn't you be holding an int?
Moving on to my question, recently I have started to look into the source code of the well known open source project AutoFixture originally devloped by Mark Seemann which I really appreciate.
One of the main components of the library is the interface ISpecimenBuilder which define an somehow abstract method:
object Create(object request, ISpecimenContext context);
As you can see request parameter type is object and by such it accepts completely different types, different implementations of the interface treat different requests by their runtime type, checking if it is something they cable dealing with otherwise returning some kind of no response representation.
It seems that the design of the interface does not adhere to the "good practice" that object casting should be used sparsely.
I was thinking to myself if there is a better way to design this contract in a way that defeats all the casting but couldn't find any solution.
Obviously the object parameter could be replaced with some marker interface but it will not save us the casting problem, I have also thought that it is possible to use some variation of visitor pattern as described here but it does not seem to be very scalable, the visitor will must have dozens of different methods since there is so many different implementations of the interface that capable dealing with different types of requests.
Although the fact that I basically agree with the arguments against using casting as part of a good design in this specific scenario it seems as not only the best option but also the only realistic one.
To sum up, is object casting and a very general contracts are inevitability of reality when there is a need to design modular and extendable architecture?
I don't think that I can answer this question generally, for any type of application or framework, but I can offer an answer that specifically talks about AutoFixture, as well as offer some speculation about other usage scenarios.
If I had to write AutoFixture from scratch today, there's certainly things I'd do differently. Particularly, I wouldn't design the day-to-day API around something like ISpecimenBuilder. Rather, I'd design the data manipulation API around the concept of functors and monads, as outlined here.
This design is based entirely on generics, but it does require statically typed building blocks (also described in the article) known at compile time.
This is closely related to how something like QuickCheck works. When you write QuickCheck-based tests, you must supply generators for all of your own custom types. Haskell doesn't support run-time casting of values, but instead relies exclusively on generics and some compile-time automation. Granted, Haskell's generics are more powerful than C#'s, so you can't necessarily transfer the knowledge gained from Haskell to C#. It does suggest, however, that it's possible to write code entirely without relying on run-time casting.
AutoFixture does, however, support user-defined types without the need for the user to write custom generators. It does this via .NET Reflection. In .NET, the Reflection API is untyped; all the methods for generating objects and invoking members take object as input and return object as output.
Any application, library, or framework based on Reflection will have to perform some run-time casting. I don't see how to get around that.
Would it be possible to write data generators without Reflection? I haven't tried the following, but perhaps one could adopt a strategy where one would write 'the code' for a data generator directly in IL and use Reflection emit to dynamically compile an in-memory assembly that contains the generators.
This is a bit like how the Hiro container works, IIRC. I suppose that one could design other types of general-purpose frameworks around this concept, but I rarely see it done in .NET.
This may make a lot of C# programmers cringe, but is it ok to virtual-ize every method in a base class -- even if certain methods are never overridden?
The reason I need to do this is that I have a special case where I need to get C# to act like Java. It's actually an automatic program transformation of a Java program.
I'm thinking to mark any Java method that has no base method as virtual, and any that do have an associated base method as override.
Aside from a lack of flexibility, are there any other issues with doing it this way?
Yes it is okay, but not necessarily a good practice. Virtualization helps with two things: inheritance and decoupling (for things like unit testing or replacing out other classes with new ones).
Prefer composition over inheritance in your OO design and you can use interfaces instead of virtuals with your classes. That will give you what you need for both unit testing and composition.
But, with the speed of today's CPUs, I'd not worry terribly about the extra V-table lookup if everything is virtual.
So I suggest, if you can solve your problem by providing an interface for the java program, do that. Otherwise, don't lose any sleep over having the methods virtual.
Not only is this OK, but some applications need this.
nHibernate requires you to mark properties as virtual for mapping, for example.
This is ok. There will be no unwanted behaviors caused by this, and as far as I know, virtual methods do not cause the classes to slow down. So, why not?
I came across a technique in template metadata programming which allows to implement polymorphism without the virtual function mechanism.
Hence I am wondering if there are some other tricks one can use to do polymorphic behavior in c++ or C#?
EDIT : also sometime ago i read that visitor design pattern is an alternative to virtual mechanism, but cannot recall the article. can someone confirm it it can be used too?
EDIT2 : I understand it is not an ideal programming practice, but hack is what I am looking for since optimization is of primary concern. The classes hierarchy is assigned during compile time (the pointers are not assigned to the classes during run time by if-else logic, etc.)
Do you ever wonder why polymorphism tends to add around 4 bytes of overhead per use? It's because that's the simplest, most practical way to implement polymorphism. There are grotesque hacks possible that can simulate C++/C#/Objective-C polymorphism with less overhead, but they come with great tradeoffs--several times the CPU usage per call, for instance, or statically-stored class hierarchies with limited extensibility.
Polymorphism is implemented the way it is implemented because the way it is implemented is already optimal.
(EDIT: This is answer is in the context of C#.)
You're not going to save 4 bytes per object anyway. The type will still have a single vtable to lookup function member implementations. You might save one entry in that table by avoiding a virtual method, but that's a single entry in a single object - it's not going to affect you on a per-instance basis.
Either you're missing something, or I am. It would help if you could edit the question to show what you're trying to do in the most natural implementation - then explain where you're trying to save space.
Templates can only be used to implement compile-time polymorphism. I'm not aware of any mechanism to implement runtime polymorphism without a space overhead, but why does it matter? If you need runtime polymorphism to solve your problem, you should use it.
If you really want to save memory, you could implement your own memory handling. Block headers and footers are larger than 4 bytes each (I think they're 8 bytes each), so putting everything in a huge memory block and doing your own indexing would be the way to go. Don't even use objects, just binary indices.
My point is, unless you're designing low-level databases, OS kernels, or a SOC, you really shouldn't be worrying about this. Especially with C#. Do you know how much overhead garbage collection has??
There are lots of tricks varying from well known techniques (e.g., the CRTP in C++) to rather ugly hacks (I once roughly quintupled the capacity of a particular program by eliminating vtable pointers; instead, I implemented a separate allocator for each of the classes, and found the correct vtable based on the object's address).
The visitor pattern really just displaces polymorphism rather than replacing it. In other words, what could/would have been polymorphic behavior in the class being visited is moved into the visitor class. To get dynamic binding, however, you still have a hierarchy of visitor classes, typically using the usual vtable mechanism to implement polymorphism. Under the right circumstances, this can still save a fair amount of memory though, as the number of visitor objects is often much smaller than the number of objects they visit.
Boost.Variant is a C++ library implementation of the visitor design pattern that avoids dynamic polymorphism.
And for your edification, the C++ static polymorphism approach you mentioned has a name: CRTP.
I am planning to use dynamic keyword for my new project. But before stepping in, I would like to know about the pros and cons in using dynamic keyword over Reflection.
Following where the pros, I could find in respect to dynamic keyword:
Readable\Maintainable code.
Fewer lines of code.
While the negatives associated with using dynamic keyword, I came to hear was like:
Affects application performance.
Dynamic keyword is internally a wrapper of Reflection.
Dynamic typing might turn into breeding ground for hard to find bugs.
Affects interoperability with previous .NET versions.
Please help me on whether the pros and cons I came across are sensible or not?
Please help me on whether the pros and cons I came across are sensible or not?
The concern I have with your pros and cons is that some of them do not address differences between using reflection and using dynamic. That dynamic typing makes for bugs that are not caught until runtime is true of any dynamic typing system. Reflection code is just as likely to have a bug as code that uses the dynamic type.
Rather than thinking of it in terms of pros and cons, think about it in more neutral terms. The question I'd ask is "What are the differences between using Reflection and using the dynamic type?"
First: with Reflection you get exactly what you asked for. With dynamic, you get what the C# compiler would have done had it been given the type information at compile time. Those are potentially two completely different things. If you have a MethodInfo to a particular method, and you invoke that method with a particular argument, then that is the method that gets invoked, period. If you use "dynamic", then you are asking the DLR to work out at runtime what the C# compiler's opinion is about which is the right method to call. The C# compiler might pick a method different than the one you actually wanted.
Second: with Reflection you can (if your code is granted suitably high levels of trust) do private reflection. You can invoke private methods, read private fields, and so on. Whether doing so is a good idea, I don't know. It certainly seems dangerous and foolish to me, but I don't know what your application is. With dynamic, you get the behaviour that you'd get from the C# compiler; private methods and fields are not visible.
Third: with Reflection, the code you write looks like a mechanism. It looks like you are loading a metadata source, extracting some types, extracting some method infos, and invoking methods on receiver objects through the method info. Every step of the way looks like the operation of a mechanism. With dynamic, every step of the way looks like business logic. You invoke a method on a receiver the same way as you'd do it in any other code. What is important? In some code, the mechanism is actually the most important thing. In some code, the business logic that the mechanism implements is the most important thing. Choose the technique that emphasises the right level of abstraction.
Fourth: the performance costs are different. With Reflection you do not get any cached behaviour, which means that operations are generally slower, but there is no memory cost for maintaining the cache and every operation is roughly the same cost. With the DLR, the first operation is very slow indeed as it does a huge amount of analysis, but the analysis is cached and reused. That consumes memory, in exchange for increased speed in subsequent calls in some scenarios. What the right balance of speed and memory usage is for your application, I don't know.
Readable\Maintainable code
Certainly true in my experence.
Fewer lines of code.
Not significantly, but it will help.
Affects application performance.
Very slightly. But not even close to the way reflection does.
Dynamic keyword is internally a wrapper of Reflection.
Completely untrue. The dynamic keyword leverages the Dynamic Library Runtime.
[Edit: correction as per comment below]
It would seem that the Dynamic Language Runtime does use Reflection and the performance improvements are only due to cacheing techniques.
Dynamic typing might turn into breeding ground for hard to find bugs.
This may be true; it depends how you write your code. You are effectively removing compiler checking from your code. If your test coverage is good, this probably won't matter; if not then I suspect you will run into problems.
Affects interoperability with previous .NET versions
Not true. I mean you won't be able to compile your code against older versions, but if you want to do that then you should use the old versions as a base and up-compile it rather than the other way around. But if you want to use a .NET 2 library then you shouldn't run into too many problems, as long as you include the declaration in app.config / web.config.
One significant pro that you're missing is the improved interoperability with COM/ATL components.
There are 4 great differences between Dynamic and reflection. Below is a detailed explanation of the same. Reference http://www.codeproject.com/Articles/593881/What-is-the-difference-between-Reflection-and-Dyna
Point 1. Inspect VS Invoke
Reflection can do two things one is it can inspect meta-data and second it also has the ability to invoke methods on runtime.While in Dynamic we can only invoke methods. So if i am creating software's like visual studio IDE then reflection is the way to go. If i just want dynamic invocation from the my c# code, dynamic is the best option.
Point 2. Private Vs Public Invoke
You can not invoke private methods using dynamic. In reflection its possible to invoke private methods.
Point 3. Caching
Dynamic uses reflection internally and it also adds caching benefits. So if you want to just invoke a object dynamically then Dynamic is the best as you get performance benefits.
Point 4. Static classes
Dynamic is instance specific: You don't have access to static members; you have to use Reflection in those scenarios.
In most cases, using the dynamic keyword will not result in meaningfully shorter code. In some cases it will; that depends on the provider and as such it's an important distinction. You should probably never use the dynamic keyword to access plain CLR objects; the benefit there is too small.
The dynamic keyword undermines automatic refactoring tools and makes high-coverage unit tests more important; after all, the compiler isn't checking much of anything when you use it. That's not as much of an issue when you're interoperating with a very stable or inherently dynamically typed API, but it's particularly nasty if you use keyword dynamic to access a library whose API might change in the future (such as any code you yourself write).
Use the keyword sparingly, where it makes sense, and make sure such code has ample unit tests. Don't use it where it's not needed or where type inference (e.g. var) can do the same.
Edit: You mention below that you're doing this for plug-ins. The Managed Extensibility Framework was designed with this in mind - it may be a better option that keyword dynamic and reflection.
If you are using dynamic specifically to do reflection your only concern is compatibility with previous versions. Otherwise it wins over reflection because it is more readable and shorter. You will lose strong typing and (some) performance from the very use of reflection anyway.
The way I see it all your cons for using dynamic except interoperability with older .NET versions are also present when using Reflection:
Affects application performance
While it does affect the performance, so does using Reflection. From what I remember the DLR more or less uses Reflection the first time you access a method/property of your dynamic object for a given type and caches the type/access target pair so that later access is just a lookup in the cache making it faster then Reflection
Dynamic keyword is internally a wrapper of Reflection
Even if it was true (see above), how would that be a negative point? Whether or not it does wrap Reflection shouldn't influence your application in any significant matter.
Dynamic typing might turn into breeding ground for hard to find bugs
While this is true, as long as you use it sparingly it shouldn't be that much of a problem. Furthermore is you basically use it as a replacement for reflection (that is you use dynamic only for the briefest possible durations when you want to access something via reflection), the risk of such bugs shouldn't be significantly higher then if you use reflection to access your methods/properties (of course if you make everything dynamic it can be more of a problem).
Affects interoperability with previous .NET versions
For that you have to decide yourself how much of a concern it is for you.
I've been searching for this for quite a while with no luck so far. Is there an equivalent to Java's ClassFileTransformer in .NET? Basically, I want to create a class CustomClassFileTransformer (which in Java would implement the interface ClassFileTransformer) that gets called whenever a class is loaded, and is allowed to tweak it and replace it with the tweaked version.
I know there are frameworks that do similar things, but I was looking for something more straightforward, like implementing my own ClassFileTransformer. Is it possible?
EDIT #1. More details about why I need this:
Basically, I have a C# application and I need to monitor the instructions it wants to run in order to detect read or write operations to fields (operations Ldfld and Stfld) and insert some instructions before the read/write takes place.
I know how to do this (except for the part where I need to be invoked to replace the class): for every method whose code I want to monitor, I must:
Get the method's MethodBody using MethodBase.GetMethodBody()
Transform it to byte array with MethodBody.GetILAsByteArray(). The byte[] it returns contains the bytecode.
Analyse the bytecode as explained here, possibly inserting new instructions or deleting/modifying existing ones by changing the contents of the array.
Create a new method and use the new bytecode to create its body, with MethodBuilder.CreateMethodBody(byte[] il, int count), where il is the array with the bytecode.
I put all these tweaked methods in a new class and use the new class to replace the one that was originally going to be loaded.
An alternative to replacing classes would be somehow getting notified whenever a method is invoked. Then I'd replace the call to that method with a call to my own tweaked method, which I would tweak only the first time is invoked and then I'd put it in a dictionary for future uses, to reduce overhead (for future calls I'll just look up the method and invoke it; I won't need to analyse the bytecode again). I'm currently investigating ways to do this and LinFu looks pretty interesting, but if there was something like a ClassFileTransformer it would be much simpler: I just rewrite the class, replace it, and let the code run without monitoring anything.
An additional note: the classes may be sealed. I want to be able to replace any kind of class, I cannot impose restrictions on their attributes.
EDIT #2. Why I need to do this at runtime.
I need to monitor everything that is going on so that I can detect every access to data. This applies to the code of library classes as well. However, I cannot know in advance which classes are going to be used, and even if I knew every possible class that may get loaded it would be a huge performance hit to tweak all of them instead of waiting to see whether they actually get invoked or not.
POSSIBLE (BUT PRETTY HARDCORE) SOLUTION. In case anyone is interested (and I see the question has been faved, so I guess someone is), this is what I'm looking at right now. Basically I'd have to implement the profiling API and I'll register for the events that I'm interested in, in my case whenever a JIT compilation starts. An extract of the blogpost:
In your ICorProfilerCallback2::ModuleLoadFinished callback, you call ICorProfilerInfo2::GetModuleMetadata to get a pointer to a metadata interface on that module.
QI for the metadata interface you want. Search MSDN for "IMetaDataImport", and grope through the table of contents to find topics on the metadata interfaces.
Once you're in metadata-land, you have access to all the types in the module, including their fields and function prototypes. You may need to parse metadata signatures and this signature parser may be of use to you.
In your ICorProfilerCallback2::JITCompilationStarted callback, you may use ICorProfilerInfo2::GetILFunctionBody to inspect the original IL, and ICorProfilerInfo2::GetILFunctionBodyAllocator and then ICorProfilerInfo2::SetILFunctionBody to replace that IL with your own.
The great news: I get notified when a JIT compilation starts and I can replace the bytecode right there, without having to worry about replacing the class, etc. The not-so-great news: you cannot invoke managed code from the API's callback methods, which makes sense but means I'm on my own parsing the IL code, etc, as opposed to be able to use Cecil, which would've been a breeze.
I don't think there's a simpler way to do this without using AOP frameworks (such as PostSharp). If anyone has any other idea please let me know. I'm not marking the question as answered yet.
I don't know of a direct equivalent in .NET for this.
However, there are some ways to implement similar functionality, such as using Reflection.Emit to generate assemblies and types on demand, uing RealProxy to create proxy objects for interfaces and MarshalByRefObject objects. However, to advise what to use, it would be important to know more about the actual use case.
After quite some research I'm answering my own question: there isn't an equivalent to the ClassFileTransformer in .NET, or any straightforward way to replace classes.
It's possible to gain control over the class-loading process by hosting the CLR, but this is pretty low-level, you have to be careful with it, and it's not possible in every scenario. For example if you're running on a server you may not have the rights to host the CLR. Also if you're running an ASP.NET application you cannot do this because ASP.NET already provides a CLR host.
It's a pity .NET doesn't support this; it would be so easy for them to do this, they just have to notify you before a class is loaded and give you the chance to modify the class before passing it on the CLR to load it.