Dynamically create compiler error based on API usage?

Dynamically create compiler error based on API usage? - c#

Did not know how to search for an answer to this, but I am wondering if there is a way in a compiled language such as Java, C#, Scala etc... To force a compiler error where an API is used incorrectly.
Say you have some sort of API you are working with and you know you need to call a specific setup method X before calling some other method Y, is it possible to set things up such that the compiler will catch the error and avoid having to do so at run time?
It would be fairly useful for enforcing some code standards or fixing broken API's. No idea if its even possible though.

Tools like NDepend (for .NET) or JArchitect (for java) lets write custom code rules over LINQ queries that can emit warning or error at analysis time (in the IDE, or at Build Process time). For example the following CQLinq code rule enforces that if a method is calling MethodA(), it must call MethodB():
warnif count > 0
from m in Application.Methods where
m.IsUsing("MyNamespace.MyClass.MyMethodA()") &&
!m.IsUsing("MyNamespace.MyClass.MyMethodB()")
select m

Generally, you can't create arbitrary custom compile-time errors in most static languages. There are a few exceptions (like #error directives in C and C++, but even then the preprocessor comes strictly before the main compilation so that won't help).
However, you can utilize a language feature specifically designed to catch errors at compile time:
The type system is your friend.
Requiring methods to be called in a particular order is unfriendly to API consumers, and is a code smell in a high-level language.
A trivial solution is to expose a single method that performs both operations in the correct order. More generally (e.g. if the 2nd method doesn't have to be called), have the 2nd method call the 1st method at the top. But presumably you want the calls to be under the control of the API consumer.
In that case, refactor your code so that it is not possible to call methods in the wrong order without creating a type error. For example, suppose you're writing a regex library where regexes have to be compiled before matching. In Scala, refactor:
class Regex(pattern : String) {
private[this] var compiled : Option[CompiledData] = None
def compile() {
// do stuff
this.compiled = ...
}
def search(s : String) : MatchResult = compiled match {
case Some(c) => ... // match the string
case None => throw new IllegalStateException("must compile regex first")
}
}
to
class Regex(pattern : String) {
def compile : CompiledRegex = {
// do stuff
CompiledRegex(...)
}
}
class CompiledRegex(c : CompiledData) {
def search(s : String) : MatchResult = {
... // match the string
}
}
Since the search method is now only available on CompiledRegex, and CompiledRegex is obtained by calling Regex.compile, it's impossible to call search before that action is made valid by compiling the regex.
This setup also helps out API consumers, because if the user needs an object of type MatchResult and has already typed new Regex("[abc123]*"), a good IDE can autocomplete or suggest the needed method .compile; that would not be possible with the original setup.
(In this example, it also no longer needs mutable state, which is often avoided in Scala.)
Static assertions
As an alternative solution, some languages (I think D and C++11, though not the ones you listed) support static assertions.

the usage of specific functions ist on another level than the compiler operation ... therefore i don't think that a compiler should do things like that ...
on the other hand: compilers are commonly integrated into IDEs ... why don't you put that into an IDE module too? ... most IDEs allow pre compiler operations ...
you could write some sort of checking tool (based on CodeDOM or something similar) to test for specific function usage, and based on that check-result, abort the compiler run if the quality criteria is not matched ...

Related

How to translate or convert CompilerGenerated code?

If you try to use decompilers like: jetbrains dotpeek, redgate reflector, telerik justdecompile, whatever.. Sometimes if you need a code to copy or just to understand, it is not possible because are shown somethings like it:
[CompilerGenerated]
private sealed class Class15
{
// Fields
public Class11.Class12 CS$<>8__locals25;
public string endName;
// Methods
public Class15();
public bool <Show>b__11(object intelliListItem_0);
}
I'm not taking about obfuscation, this is happens at any time, I didsome tests (my own code), and occurs using lambdas and iterators. I'm not sure, could anyone give more information about when and why..?
So, by standard Visual Studio not compile $ and <> keywords in c# (like the code above)...
There is a way to translate or convert this decompiled code automatically?

Lambdas are a form of closure which is a posh way of saying it's a unit of code you can pass around like it was an object (but with access to its original context). When the compiler finds a lambda it generates a new type (Type being a class or struct) which encapsulates the code and any fields accessed by the lambda in its original context.
The problem here is, how do you generate code which will never conflict with user written code?
The compiler's answer is to generate code which is illegal in the language you are using, but legal in IL. IL is "Intermediate Language" it's the native language used by the Common Language Runtime. Any language which runs on the CLR (C#, vb.net, F#) compiles into IL. This is how you get to use VB.Net assemblies in C# code and so on.
So this is why the decompilers generate the hideous code you see. Iterators follow the exact same model as do a bunch of other language features that require generated types.
There is an interesting side effect. The Lambda may capture a variable in its original context:
public void TestCapture()
{
StringBuilder b = new StringBuilder();
Action l = () => b.Append("Kitties!");
}
So by capture I mean the variable b here is included in the package that defines the closure.
The compiler tries to be efficient and create as few types as possible, so you can end up with one generated class that supports all the lambdas found in a specific class, including fields for all the captured variables. In this way, if you're not careful, you can accidentally capture something you expect to be released, causing really tricky to trace memory leaks.

Is there an option to change the target framework?... I know with some decompilers they default to the lowest level framework (C# 1.0)

Method analysis using Reflection and CodeDom

The context of this question is too elaborate to describe here and will likely adversely affect responses so I am not including it. I want to assert certain things about a method in a unit test. Some of these things are possible using reflection such as format of the try/finally block, class fields and method local variables, etc. I already know the type and method signature.
protected override void OnTest ()
{
bool result = false;
SomeCOMObject com = null; // System.__ComObject
try
{
}
finally
{
System.Runtime.InteropServices.Marshal.ReleaseComObject(com);
}
return (result);
}
What I have not been able to achieve are things like:
Whether the method contains only a single return (result); statement and whether that statement is the last one in the function.
Whether all variables of type System.__ComObject have been manually de-referenced using System.Runtime.InteropServices.Marshal.ReleaseComObject(object) in the finally block.
Since some of these things are not possible using reflection, and source code text analysis is far from ideal, I turned to CodeDom but have not been able to get a grip on it. I have been told that creating expression trees from source code is not possible. Nor is it possible to create expression trees from the runtime type. If that is correct, how can I leverage CodeDom to achieve things in the list above?
I have used CodeDom in the past for code generation and compiling simple code classes to assemblies. But I have no idea how it could be used to analyze the internals of a method. Please advise.

In general, reflection built into programming languages provides no access to the content of functions. So you pretty much can't do this with reflection.
You might be able to do it if you have access to the byte-code equivalent, but byte code can't really answer questions about the syntax of the method, e.g., "how many return statements exists returning the same expression".
If you want to reason about code, your need to reason about the source code. This means you need access to a parser, and often other useful facts ("what the declaration of X?", "Is the type of X and Y compatible?", "Does data flow from X to Y?"), etc.
Roslyn provides some of this information. There are also commercial solutions (I have one).

"Ternary" operator for different method signatures

I'm looking for an elegant way to choose a method signature (overloaded) and pass an argument based on a conditional. I have an importer that will either produce the most recent file to import, or take an explicit path for the data.
Currently my code looks like this:
if (string.IsNullOrEmpty(arguments.File))
{
importer.StartImport();
}
else
{
importer.StartImport(arguments.File);
}
I would like it to look like this (or conceptually similar):
importer.StartImport(string.IsNullOrEmpty(arguments.File) ? Nothing : arguments.File);
The idea being that a different method signature would be called. A few stipulations:
1) I will not rely on 'null' to indicate an unspecified file (i.e. anything besides null itself).
2) I will not pass the arguments struct to the importer class; that violates the single responsibility principle.
One solution I'm aware of, is having only one StartImport() method that takes a single string, at which point that method resolves the conditional and chooses how to proceed. I'm currently choosing to avoid this solution because it only moves the if-statement from one method to another. I'm asking this question because:
1) I would like to reduce "8 lines" of code to 1.
2) I'm genuinely curious if C# is capable of something like this.

I would like to reduce "8 lines" of code to 1.
I think you're asking the wrong question. It's not how many lines of code you have, it's how clear, maintainable, and debuggable they are. From what you've described, importing from a default location and importing with a known file are semantically different - so I think you're correct in separating them as two different overloads. In fact, you may want to go further and actually name them differently to further clarify the difference.
I'm genuinely curious if C# is capable of something like this.
Sure, we can use all sorts of fancy language tricks to make this more compact ... but I don't think they make the code more clear. For instance:
// build a Action delegate based on the argument...
Action importAction = string.IsNullOrEmpty(arguments.File)
? () => importer.StartImport()
: () => importer.StartImport(arguments.File)
importAction(); // invoke the delegate...
The code above uses a lambda + closure to create a Action delegate of the right type, which is then invoked. But this is hardly more clear ... it's also slightly less efficient, as it requires creating a delegate and then invoking the method through that delegate. In most cases the performance overhead is completely negligible. The real problem here is the use of the closure - it's very easy to misuse code with closures - and it's entirely possible to introduce bugs by using closures incorrectly.

You can't do this using 'strong-typed' C#. Overload resolution is performed at compile time: the compiler will work out the static (compile-time) type of the argument, and resolve based on that. If you want to use one of two different overloads, you have to perform two different calls.
In some cases, you can defer overload resolution until runtime using the dynamic type. However, this has a performance (and clarity!) cost, and won't work in your situation because you need to pass a different number of arguments in the two cases.

Yes, it is possible (using reflection). But I wouldn't recommend it at all because you end up with way more code than you had before. The "8-lines" which you have are quite simple and readable. The "one-liner" using reflection is not. But for completeness sake, here's a one-liner:
importer.GetType().GetMethod("StartImport", string.IsNullOrEmpty(arguments.File) ? new Type[0] : new Type[] { typeof(string) }).Invoke(importer, string.IsNullOrEmpty(arguments.File) ? new object[0] : new object[] { arguments.File) }));

Frankly, I think your code would be a lot more readable and modular if you combined the overloaded methods back into a single method and moved the conditional inside it instead of making the caller pre-validate the input.
importer.StartImport(arguments.File);
void StartImport(string File) {
if (string.isNullOrEmpty(File)) {
...
}
else {
...
}
}
Assuming you call the method from multiple places in your code you don't have the conditional OR ternary expression scattered around violating the DRY principle with this approach.

//one line
if (string.IsNullOrEmpty(arguments.File)) {importer.StartImport();} else{importer.StartImport(arguments.File);}

When should one use dynamic keyword in c# 4.0?

When should one use dynamic keyword in c# 4.0?.......Any good example with dynamic keyword in c# 4.0 that explains its usage....

Dynamic should be used only when not using it is painful. Like in MS Office libraries. In all other cases it should be avoided as compile type checking is beneficial. Following are the good situation of using dynamic.
Calling javascript method from Silverlight.
COM interop.
Maybe reading Xml, Json without creating custom classes.

How about this? Something I've been looking for and was wondering why it was so hard to do without 'dynamic'.
interface ISomeData {}
class SomeActualData : ISomeData {}
class SomeOtherData : ISomeData {}
interface ISomeInterface
{
void DoSomething(ISomeData data);
}
class SomeImplementation : ISomeInterface
{
public void DoSomething(ISomeData data)
{
dynamic specificData = data;
HandleThis( specificData );
}
private void HandleThis(SomeActualData data)
{ /* ... */ }
private void HandleThis(SomeOtherData data)
{ /* ... */ }
}
You just have to maybe catch for the Runtime exception and handle how you want if you do not have an overloaded method that takes the concrete type.
Equivalent of not using dynamic will be:
public void DoSomething(ISomeData data)
{
if(data is SomeActualData)
HandleThis( (SomeActualData) data);
else if(data is SomeOtherData)
HandleThis( (SomeOtherData) data);
...
else
throw new SomeRuntimeException();
}

As described in here dynamics can make poorly-designed external libraries easier to use: Microsoft provides the example of the Microsoft.Office.Interop.Excel assembly.
And With dynamic, you can avoid a lot of annoying, explicit casting when using this assembly.
Also, In opposition to #user2415376 ,It is definitely not a way to handle Interfaces since we already have Polymorphism implemented from the beginning days of the language!
You can use
ISomeData specificData = data;
instead of
dynamic specificData = data;
Plus it will make sure that you do not pass a wrong type of data object instead.

Check this blog post which talks about dynamic keywords in c#. Here is the gist:
The dynamic keyword is powerful indeed, it is irreplaceable when used with dynamic languages but can also be used for tricky situations while designing code where a statically typed object simply will not do.
Consider the drawbacks:
There is no compile-time type checking, this means that unless you have 100% confidence in your unit tests (cough) you are running a risk.
The dynamic keyword uses more CPU cycles than your old fashioned statically typed code due to the additional runtime overhead, if performance is important to your project (it normally is) don’t use dynamic.
Common mistakes include returning anonymous types wrapped in the dynamic keyword in public methods. Anonymous types are specific to an assembly, returning them across assembly (via the public methods) will throw an error, even though simple testing will catch this, you now have a public method which you can use only from specific places and that’s just bad design.
It’s a slippery slope, inexperienced developers itching to write something new and trying their best to avoid more classes (this is not necessarily limited to the inexperienced) will start using dynamic more and more if they see it in code, usually I would do a code analysis check for dynamic / add it in code review.

Here is a recent case in which using dynamic was a straightforward solution. This is essentially 'duck typing' in a COM interop scenario.
I had ported some code from VB6 into C#. This ported code still needed to call other methods on VB6 objects via COM interop.
The classes needing to be called looked like this:
class A
{
void Foo() {...}
}
class B
{
void Foo() {...}
}
(i.e., this would be the way the VB6 classes looked in C# via COM interop.
Since A and B are independent of each other you can't cast one to the other, and they have no common base class (COM doesn't support that AFAIK and VB6 certainly didn't. And they did not implement a common interface - see below).
The original VB6 code which was ported did this:
' Obj must be either an A or a B
Sub Bar(Obj As Object)
Call Obj.Foo()
End Sub
Now in VB6 you can pass things around as Object and the runtime will figure out if those objects have method Foo() or not. But in C# a literal translation would be:
// Obj must be either an A or a B
void Bar(object Obj)
{
Obj.Foo();
}
Which will NOT work. It won't compile because object does not have a method called "Foo", and C# being typesafe won't allow this.
So the simple "fix" was to use dynamic, like this:
// Obj must be either an A or a B
void Bar(dynamic Obj)
{
Obj.Foo();
}
This defers type safety until runtime, but assuming you've done it right works just fine.
I wouldn't endorse this for new code, but in this situation (which I think is not uncommon judging from other answers here) it was valuable.
Alternatives considered:
Using reflection to call Foo(). Probably would work, but more effort and less readable.
Modifying the VB6 library wasn't on the table here, but maybe there could be an approach to define A and B in terms of a common interface, which VB6 and COM would support. But using dynamic was much easier.
Note: This probably will turn out to be a temporary solution. Eventually if the remaining VB6 code is ported over then a proper class structure can be used.

I will like to copy an excerpt from the code project post, which define that :
Why use dynamic?
In the statically typed world, dynamic gives developers a lot of rope
to hang themselves with. When dealing with objects whose types can be
known at compile time, you should avoid the dynamic keyword at all
costs. Earlier, I said that my initial reaction was negative, so what
changed my mind? To quote Margret Attwood, context is all. When
statically typing, dynamic doesn't make a stitch of sense. If you are
dealing with an unknown or dynamic type, it is often necessary to
communicate with it through Reflection. Reflective code is not easy to
read, and has all the pitfalls of the dynamic type above. In this
context, dynamic makes a lot of sense.[More]
While Some of the characteristics of Dynamic keyword are:
Dynamically typed - This means the type of variable declared is
decided by the compiler at runtime time.
No need to initialize at the time of declaration.
e.g.,
dynamic str;
str=”I am a string”; //Works fine and compiles
str=2; //Works fine and compiles
Errors are caught at runtime
Intellisense is not available since the type and its related methods and properties can be known at run time only. [https://www.codeproject.com/Tips/460614/Difference-between-var-and-dynamic-in-Csharp]

It is definitely a bad idea to use dynamic in all cases where it can be used. This is because your programs will lose the benefits of compile-time checking and they will also be much slower.

What is 'unverifiable code' and why is it bad?

I am designing a helper method that does lazy loading of certain objects for me, calling it looks like this:
public override EDC2_ORM.Customer Customer {
get { return LazyLoader.Get<EDC2_ORM.Customer>(
CustomerId, _customerDao, ()=>base.Customer, (x)=>Customer = x); }
set { base.Customer = value; }
}
when I compile this code I get the following warning:
Warning 5 Access to member
'EDC2_ORM.Billing.Contract.Site'
through a 'base' keyword from an
anonymous method, lambda expression,
query expression, or iterator results
in unverifiable code. Consider moving
the access into a helper method on the
containing type.
What exactly is the complaint here and why is what I'm doing bad?

"base.Foo" for a virtual method will make a non-virtual call on the parent definition of the method "Foo". Starting with CLR 2.0, the CLR decided that a non-virtual call on a virtual method can be a potential security hole and restricted the scenarios in which in can be used. They limited it to making non-virtual calls to virtual methods within the same class hierarchy.
Lambda expressions put a kink in the process. Lambda expressions often generate a closure under the hood which is a completely separate class. So the code "base.Foo" will eventually become an expression in an entirely new class. This creates a verification exception with the CLR. Hence C# issues a warning.
Side Note: The equivalent code will work in VB. In VB for non-virtual calls to a virtual method, a method stub will be generated in the original class. The non-virtual call will be performed in this method. The "base.Foo" will be redirected into "StubBaseFoo" (generated name is different).

I suspect the problem is that you're basically saying, "I don't want to use the most derived implementation of Customer - I want to use this particular one" - which you wouldn't be able to do normally. You're allowed to do it within a derived class, and for good reasons, but from other types you'd be violating encapsulation.
Now, when you use an anonymous method, lambda expression, query expression (which basically uses lambda expressions) or iterator block, sometimes the compiler has to create a new class for you behind the scenes. Sometimes it can get away with creating a new method in the same type for lambda expressions, but it depends on the context. Basically if any local variables are captured in the lambda expression, that needs a new class (or indeed multiple classes, depending on scope - it can get nasty). If the lambda expression only captures the this reference, a new instance method can be created for the lambda expression logic. If nothing is captured, a static method is fine.
So, although the C# compiler knows that really you're not violating encapsulation, the CLR doesn't - so it treats the code with some suspicion. If you're running under full trust, that's probably not an issue, but under other trust levels (I don't know the details offhand) your code won't be allowed to run.
Does that help?

copy/pasting from here:
Codesta: C#/CLR has 2 kinds of code, safe and unsafe. What is it trying to provide and how did this affect the virtual machine?
Peter Hallam: For C# the terms are safe and unsafe. The CLR uses the terms verifiable and unverifiable.
When running verifiable code the CLR can enforce security policies; the CLR can prevent verifiable code from doing things that it doesn't have permission to do. When running potentially malicious code, code that was downloaded from the internet for example, the CLR will only run verifiable code, and will ensure that the untrusted code doesn't access anything that it doesn't have permission to access.
The use of standard C style pointers creates unverifiable code. The CLR supports C style pointers natively. Once you've got a C style pointer you can read or write to any byte of memory in the process, so the runtime cannot enforce security policy. Actually it could but the performance penalty would make it impractical.
Now, that does not fully answer your question (i.e. WHY is this now unverifiable code), but at least it explains that "unverifiable" is the CLR-term for "unsafe". I assume that anonymous methods and base classes result in some funky pointer-magic internally.
By the Way: I think that the code snippet does not match the Warning message. The code is talking about a Customer, the Warning is about the Billing. Is it possible to post the actuon code the warning is generated for? Maybe you have something else in that code that would better explain why you get the warning.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.