C# generics implementation - c#

I am new to C# and was trying to write an SNTP server on the weekend.
During the course of this development I ended up with exactly the same question as this one: How to use generics to pass argument to a non-generic method?
The question, repeated here is: "How to use generics to pass argument to a non-generic method?" The crucial answer to this question was that the non-generic method in question didn't have an overload which accepted an Object.
Now the question I have is a follow on question: why is Generics implemented this way? Or to rephrase, why are constraints required at all?
My understanding so far is that generics help to preserve compile time type safety which means that the compiler knows what types are being dealt with at compile time.
Why wasn't C# (or perhaps this question should pertain to the CLR) implemented such that the compiler can accept the fact that a generic class/method is being created in which an argument can be provided which may not be acceptable in all cases. Then, when the generic class/method get's invoked, the compiler can see the issue and complain at that time.
Is this a technical limitation?
It just seems like a real pity that a generic method cannot be created to wrap a non generic method with multiple overloads. Unless I opt to defer type checking to run time which is the solution to the aforementioned question, I would have to wrap this overloaded method with a suite of methods, one for each signature, even though the code within it will look identical. This would have been an ideal place to leverage a generic method.

The person who can best explain this is Eric Lippert, and he did, in What’s the difference, part one: Generics are not templates:
We do the overload resolution once and bake in the result. We do not change it at runtime when someone, possibly in an entirely different assembly, uses string as a type argument to the method. The IL we’ve generated for the generic type already has the method its going to call picked out. The jitter does not say “well, I happen to know that if we asked the C# compiler to execute right now with this additional information then it would have picked a different overload. Let me rewrite the generated code to ignore the code that the C# compiler originally generated...” The jitter knows nothing about the rules of C#.
[...]
Now, if you do want overload resolution to be re-executed at runtime based on the runtime types of the arguments, we can do that for you; that’s what the new “dynamic” feature does in C# 4.0. Just replace “object” with “dynamic” and when you make a call involving that object, we’ll run the overload resolution algorithm at runtime and dynamically spit code that calls the method that the compiler would have picked, had it known all the runtime types at compile time.
So why not: because the runtime wouldn't know how to re-generate the required code.
And something about a design philosophy that your code should fail as early as possible, preferably during compile time, but I can't find that quote right now.

Related

Is object casting an inevitability of reality when there is a need to design modular architecture?

It is common to read around that object casting is a bad practice and should be avoided, for instance Why should casting be avoided? question has gotten some answers with great arguments:
By Jerry Coffin:
Looking at things more generally, the situation's pretty simple (at
least IMO): a cast (obviously enough) means you're converting
something from one type to another. When/if you do that, it raises the
question "Why?" If you really want something to be a particular type,
why didn't you define it to be that type to start with? That's not to
say there's never a reason to do such a conversion, but anytime it
happens, it should prompt the question of whether you could re-design
the code so the correct type was used throughout.
By Eric Lippert:
Both kinds of casts are red flags. The first kind of cast
raises the question "why exactly is it that the developer knows
something that the compiler doesn't?" If you are in that situation
then the better thing to do is usually to change the program so that
the compiler does have a handle on reality. Then you don't need the
cast; the analysis is done at compile time.
The second kind of cast raises the question "why isn't the operation
being done in the target data type in the first place?" If you need
a result in ints then why are you holding a double in the first
place? Shouldn't you be holding an int?
Moving on to my question, recently I have started to look into the source code of the well known open source project AutoFixture originally devloped by Mark Seemann which I really appreciate.
One of the main components of the library is the interface ISpecimenBuilder which define an somehow abstract method:
object Create(object request, ISpecimenContext context);
As you can see request parameter type is object and by such it accepts completely different types, different implementations of the interface treat different requests by their runtime type, checking if it is something they cable dealing with otherwise returning some kind of no response representation.
It seems that the design of the interface does not adhere to the "good practice" that object casting should be used sparsely.
I was thinking to myself if there is a better way to design this contract in a way that defeats all the casting but couldn't find any solution.
Obviously the object parameter could be replaced with some marker interface but it will not save us the casting problem, I have also thought that it is possible to use some variation of visitor pattern as described here but it does not seem to be very scalable, the visitor will must have dozens of different methods since there is so many different implementations of the interface that capable dealing with different types of requests.
Although the fact that I basically agree with the arguments against using casting as part of a good design in this specific scenario it seems as not only the best option but also the only realistic one.
To sum up, is object casting and a very general contracts are inevitability of reality when there is a need to design modular and extendable architecture?
I don't think that I can answer this question generally, for any type of application or framework, but I can offer an answer that specifically talks about AutoFixture, as well as offer some speculation about other usage scenarios.
If I had to write AutoFixture from scratch today, there's certainly things I'd do differently. Particularly, I wouldn't design the day-to-day API around something like ISpecimenBuilder. Rather, I'd design the data manipulation API around the concept of functors and monads, as outlined here.
This design is based entirely on generics, but it does require statically typed building blocks (also described in the article) known at compile time.
This is closely related to how something like QuickCheck works. When you write QuickCheck-based tests, you must supply generators for all of your own custom types. Haskell doesn't support run-time casting of values, but instead relies exclusively on generics and some compile-time automation. Granted, Haskell's generics are more powerful than C#'s, so you can't necessarily transfer the knowledge gained from Haskell to C#. It does suggest, however, that it's possible to write code entirely without relying on run-time casting.
AutoFixture does, however, support user-defined types without the need for the user to write custom generators. It does this via .NET Reflection. In .NET, the Reflection API is untyped; all the methods for generating objects and invoking members take object as input and return object as output.
Any application, library, or framework based on Reflection will have to perform some run-time casting. I don't see how to get around that.
Would it be possible to write data generators without Reflection? I haven't tried the following, but perhaps one could adopt a strategy where one would write 'the code' for a data generator directly in IL and use Reflection emit to dynamically compile an in-memory assembly that contains the generators.
This is a bit like how the Hiro container works, IIRC. I suppose that one could design other types of general-purpose frameworks around this concept, but I rarely see it done in .NET.

c# optional parameters in constructor [duplicate]

I came across this today, and I am surprised that I haven't noticed it before. Given a simple C# program similar to the following:
public class Program
{
public static void Main(string[] args)
{
Method(); // Called the method with no arguments.
Method("a string"); // Called the method with a string.
Console.ReadLine();
}
public static void Method()
{
Console.WriteLine("Called the method with no arguments.");
}
public static void Method(string aString = "a string")
{
Console.WriteLine("Called the method with a string.");
}
}
You get the output shown in the comments for each method call.
I understand why the compiler chooses the overloads that it does, but why is this allowed in the first place? I am not asking what the overload resolution rules are, I understand those, but I am asking if there is a technical reason why the compiler allows what are essentially two overloads with the same signature?
As far as I can tell, a function overload with a signature that differs from another overload only through having an additional optional argument offers nothing more than it would if the argument (and all preceding arguments) were simply required.
One thing it does do is makes it possible for a programmer (who probably isn't paying enough attention) to think they're calling a different overload to the one that they actually are.
I suppose it's a fairly uncommon case, and the answer for why this is allowed may just be because it's simply not worth the complexity to disallow it, but is there another reason why C# allows function overloads to differ from others solely through having one additional optional argument?
His point that Eric Lippert could have an answer lead me to this https://meta.stackoverflow.com/a/323382/1880663, which makes it sounds like my question will only annoy him. I'll try to rephrase it to make it clearer that I'm asking about the language design, and that I'm not looking for a spec reference
I appreciate it! I am happy to talk about language design; what annoys me is when I waste time doing so when the questioner is very unclear about what would actually satisfy their request. I think your question was phrased clearly.
The comment to your question posted by Hans is correct. The language design team was well aware of the issue you raise, and this is far from the only potential ambiguity created by optional / named arguments. We considered a great many scenarios for a long time and designed the feature as carefully as possible to mitigate potential problems.
All design processes are the result of compromise between competing design principles. Obviously there were many arguments for the feature that had to be balanced against the significant design, implementation and testing costs, as well as the costs to users in the form of confusion, bugs, and so on, from accidental construction of ambiguities such as the one you point out.
I'm not going to rehash what was dozens of hours of debate; let me just give you the high points.
The primary motivating scenario for the feature was, as Hans notes, popular demand, particularly coming from developers who use C# with Office. (And full disclosure, as a guy on the team that wrote the C# programming model for Word and Excel before I joined the C# team, I was literally the first one asking for it; the irony that I then had to implement this difficult feature a couple years later was not lost on me.) Office object models were designed to be used from Visual Basic, a language that has long had optional / named parameter support.
C# 4 might have seemed like a bit of a "thin" release in terms of obvious features. That's because a lot of the work done in that release was infrastructure for allowing more seamless interoperability with object models that were designed for dynamic languages. The dynamic typing feature is the obvious one, but there were numerous other small features added that combine together to make working with dynamic and legacy COM object models easier. Named / optional arguments was just one of them.
The fact that we had existing languages like VB that had this specific feature for decades and the world hadn't ended yet was further evidence that the feature was both doable and valuable. It's great having an example where you can learn from its successes and failures before designing a new version of the feature.
As for the specific situation you mention: we considered doing things like detecting when there was a possible ambiguity and making a warning, but that then opens up a whole other cans of worms. Warnings have to be for code that is common, plausible and almost certainly wrong, and there should be a clear way to address the problem that causes the warning to go away. Writing an ambiguity detector is a lot of work; believe me, it took way longer to write the ambiguity detection in overload resolution than it took to write the code to handle successful cases. We didn't want to spend a lot of time on adding a warning for a rare scenario that is hard to detect and that there might be no clear advice on how to eliminate the warning.
Also, frankly, if you write code where you have two methods named the same thing that do something completely different depending on which one you call, you already have a larger design problem on your hands! Fix that problem first, rather than worrying that someone is going to accidentally call the wrong method; make it so that either method is the right one to call.
This behaviour is specified by Microsoft at the MSDN. Have a look at Named and Optional Arguments (C# Programming Guide).
If two candidates are judged to be equally good, preference goes to a candidate that does not have optional parameters for which arguments were omitted in the call. This is a consequence of a general preference in overload resolution for candidates that have fewer parameters.
A reason why they decided to implement it the way like this could be if you want to overload a method afterwards. So you don't have to change all your method calls that are already written.
UPDATE
I'm surprised, also Jon Skeet has no real explantation why they did it like this.
I think this question basically boils down to how those signatures are represented by the intermediate language. Note that the signatures of both overloads are not equal! The second method has a signature like this:
.method public hidebysig static void Method([opt] string aString) cil managed
{
.param [1] = string('a string')
// ...
}
In IL the signature of the method is different. It takes a string, which is marked as optional. This changes the behaviour of how the parameter get's initialize, but does not change the presence of this parameter.
The compiler is not able to decide, which method you are calling, so it uses the one that fits best, based on the parameters you provide. Since you did not provide any parameters for the first call, it assumes that you are calling the overload without any parameters.
In the end it is a question about good code design. As a rule of thumb, I either use optional parameters or overloads, depending on what I want to do: Optional parameters are good, if the logic within the method does not depend on the provided arguments, while overloads are good to provide a different implementation for different sets of arguments. If you ever find yourself checking if a parameter equals a default value in order to decide what to do, you should probably go for an overload. On the other hand, if you find yourself repeating large chunks of code in many overloads, you should try extracting optional parameters.
There's also a good answer of Chuck Skeet to this question.

How to perform overload resolution with generics programatically

I have a number of MethodBase instances referencing different open generic methods (expected), e.g. representing the following methods:
T Foo<T>(T nevermind, T other);
T Foo<T>(string nevermind, T other);
And I have a single MethodBase instance referencing closed method that was actually called (actual), e.g.:
int Foo<int>(string nevermind, int other);
How can I programatically check if actual closed method could match any of given expected open methods, especially when considering all the generics pitfalls and complications?
Specifically, I would like to identify that the correct item from expected list for given actual closed method is T Foo<T>(string nevermind, T other); and not the second one.
Moreover, for MethodBase corresponding to double Foo<double>(double something, string other) I'd like to have no results matched.
Is iterating through candidate methods and checking if each parameter from expected is assignable from corresponding actual parameter a good way? If so, is it the simplest way? Do I need to consider any special cases to not match methods that will not be chosen according to method overloads resolution rules in .NET?
Tl;dr. The problem is not possible to solve using reflection, at least as I understand it, and without more specificity..
Method resolution rules are extremely complicated, especially for generic methods. There are many pitfalls you will fall into. You will need to know not only the method, the type parameter, but also a lot of information about the target, along with its own type parameters. In some cases, where the method was called from.
Method has implementation in a base class but is hidden by the child.
Method is from an interface, and was implemented explicitly, and may have another method with the same name on the implementer.
A methods such as Foo<T>(T a, string other), Foo<T>(string a, T other), Foo<T>(string a, string other) and some other variations cannot be disambiguated for T = string unless you know where the call is coming from (these are legal methods, and the one that gets called depends on several things).
Generic constraints can be placed on methods.
Polymorphism on the argument types, including generic variance for interfaces and delegates.
Optional parameters.
This goes on and on.
Basically, it can never work. Not using reflection. Not the way you're proposing. Even if you have restrictions about what calls can be made, you'd have to decide which things to check and which not, and you will always miss a few. These aren't the only pitfalls by the way, just a random sampling.
However, you do have some options.
The first, and best option in my opinion, is going a step back and thinking about the original problem. Post that if you can. It might have a different answer, and people will be able to advise you better. Hopefully it's less complicated to understand.
If you limited the scope of the matter greatly, such as no generic constraints, no interfaces, and so forth, this might be possible. It would be error prone, because there are lots of gotcha's.
You can try resolving it at runtime using dynamic binding, but the way dynamic binding resolves methods may be different from the way it normally happens. I don't know much about this, though.
You can hook the runtime and also investigate method calls as they are resolved. There are libraries for this. This will even allow you to understand how late binding is resolved.
Finally, you can look into the IL, possibly with the aid of various tools and libraries such as Mono.Cecil. In the built library, method resolution has already been performed, so you will see exactly which methods are called from which locations. This doesn't sound feasible however.
Oh, there is Roslyn, and other compilers with interfaces. They already have the resolution logic implemented, so they may make the task easier. IF they are open source, you can try to understand how method resolution is performed there. I'm kind of out of my depth here, though. And I suspect it's not feasible.
I don't like posting links to specific libraries because I'd rather you just research them. Also because there are many options.
To summarize, at least in my opinion, and as I understand the problem, without great restrictions on the methods and more information, it is impossible.

What is Run Time and Compile Time Polymorphism?

Can anyone explain to me run-time polymorphism and compile time polymorphism with respect to C#?
I have found similar questions on SO but they were regarding C++.
Here is a site with a good explanation:
http://www.dickbaldwin.com/csharp/Cs000120.htm
To quote the article:
The reason that this type of polymorphism is often referred to as runtime polymorphism is because the decision as to which version of the method to execute cannot be made until runtime. The decision cannot be made at compile time (as is the case with overloaded methods).
The decision cannot be made at compile time because the compiler has no way of knowing (when the program is compiled) the actual type of the object whose reference will be stored in the reference variable.
In an extreme case, for example, the object might be obtained at runtime from a network connection of which the compiler has no knowledge.

C# vs Java generics [duplicate]

This question already has answers here:
What are the differences between Generics in C# and Java... and Templates in C++? [closed]
(13 answers)
Closed 9 years ago.
I have heard that the Java implementation of Generics is not as good as the C# implementation. In that the syntax looks similar, what is it that is substandard about the Java implementation, or is it a religious point of view?
streloksi's link does a great job of breaking down the differences. The quick and dirty summary though is ...
In terms of syntax and usage. The syntax is roughly the same between the languages. A few quirks here and there (most notably in constraints). But basically if you can read one, you can likely read/use the other.
The biggest difference though is in the implementation.
Java uses the notion of type erasure to implement generics. In short the underlying compiled classes are not actually generic. They compile down to Object and casts. In effect Java generics are a compile time artifact and can easily be subverted at runtime.
C# on the other hand, by virtue of the CLR, implement generics all they way down to the byte code. The CLR took several breaking changes in order to support generics in 2.0. The benefits are performance improvements, deep type safety verification and reflection.
Again the provided link has a much more in depth breakdown I encourage you to read
The difference comes down to a design decision by Microsoft and Sun.
Generics in Java is implemented through type erasure by the compiler, which means that the type checking occurs at compile time, and the type information is removed. This approach was taken to keep the legacy code compatible with new code using generics:
From The Java Tutorials, Generics: Type Erasure:
When a generic type is instantiated,
the compiler translates those types by
a technique called type erasure — a
process where the compiler removes all
information related to type parameters
and type arguments within a class or
method. Type erasure enables Java
applications that use generics to
maintain binary compatibility with
Java libraries and applications that
were created before generics.
However, with generics in C# (.NET), there is no type erasure by the compiler, and the type checks are performed during runtime. This has its benefits that the type information is preserved in the compiled code.
From Wikipedia:
This design choice is leveraged to
provide additional functionality, such
as allowing reflection with
preservation of generic types, as well
as alleviating some of the limitations
of erasure (such as being unable to
create generic arrays). This
also means that there is no
performance hit from runtime casts and
normally expensive boxing conversions.
Rather than saying ".NET generics is better than Java generics", one should look into the difference in the approach to implement generics. In Java, it appears that preserving compatibility was a high priority, while in .NET (when introduced at version 2.0), the realizing the full benefit of using generics was a higher priority.
Also found this conversation with Anders Hejlsberg that may be interesting too. To summarize points that Anders Hejlsberg made with some additional notes: Java generics were made for maximum compatibility with existing JVM that led to few odd things versus implementation you see in C#:
Type erasure forces implementation to represent every generic parametrized value as Object. While compiler provides automatic casts between Object and more specific type, it does not remove the negative impact of the type casts and boxing on performance (e.g. Object is cast to specific type MyClass or int had to be boxed in Integer, which would be even more serious for C#/.NET if they followed type erasure approach due to user-defined value types). As Anders said: "you don't get any of the execution efficiency" (that reified generics enable in C#)
Type erasure makes information available at compile time not accessible during runtime. Something that used to be List<Integer> becomes just a List with no way to recover generic type parameter at runtime. This makes difficult to build reflection or dynamic code-generation scenarios around Java generics. More recent SO answer shows a way around it via anonymous classes. But without tricks, something like generating code at runtime via reflection that gets elements from one collection instance and puts it to another collection instance can fail at runtime during execution of dynamically generated code: reflection doesn't help with catching mismatch in List<Double> versus List<Integer> in these situations.
But +1 for the answer linking to Jonathan Pryor's blog post.

Categories

Resources