The request:
I'd like to be able to write an analyzer that can provide a proxy value for a certain expression and trigger a re-parsing of the document.
The motivation:
Our code is littered with ABTests that can be either in a deployed or active state with a control and variant group.
Determining a test's state is done through a database lookup.
For the tests that are deployed with the control group, any statement of the following form will evaluate to false:
if(ExperimentService.IsInVariant(ABTest.Test1))
{
}
I'm trying to provide tooling to make this easier to deal with at develop time by greying it out in this scenario.
As it is, this is fairly limited and not robust because I basically have to play parser myself.
What if the actual code is
if(!ExperimentService.IsInVariant(ABTest.Test1))
or
if(ExperimentService.IsInVariant(ABTest.Test1) || true)
or
var val = ..... && (ExperimentService.IsInVariant(ABTest.Test1);
if(val){
// val is always going to be false if we deployed control.
}
A possible approach I could see provided is by allowing us to write analyzers that are fired once and rewrite the tree before the actual IDE parsing happens (or, well, just parse it a second time).
These should only fire once and allow us to replace a certain expression with another. This would allow me to swap all of these experiment calls for true and false literals.
As a result, these sections could benefit from all the other IDE features such as code greying for unreachable code but also more intricate ones like a variable that will never have a different value
Obviously this is just an example and I'm not sure how feasible it is. Any suggestions for a proper feature or something that already exists are more than welcome.
I don't think there's an approach that doesn't have a compromise.
ReSharper doesn't support rewriting the AST before analysis - that would just rewrite the text in the file.
You could write an analyser that greys out the code, by applying a "dead code" highlight to the contents of the if block, but as you say, you'd need to parse the code and analyse control flow in order to get it correct, and I think that would be very difficult (ReSharper does provide a control flow graph, so you could walk it, but it would be up to you to A. find the return value of IsInVariant and B. trace that value through whatever conditions, && or || statements until you find an appropriate if block).
Alternatively, you could mark the IsInVariant method with the ContractAnnotation attribute, something like:
[ContractAnnotation("=> false")]
public bool IsInVariant(string identifier)
{
// whatever...
}
This will tell ReSharper's analysis that this method always returns false (you can also say it will return true/false/null/not null based on specific input). Because it always returns false, ReSharper will grey out the code in the if statement, or the else branch if you do if (!IsInVariant(…)).
The downside here is that ReSharper will also add a warning to the if statement to tell you that the expression always returns false. So, it's a compromise, but you could change the severity of that warning to Hint, so it's not so intrusive.
This is not enough to really warrant the bounty, but one solution that might apply from the developer documentation is to create a custom language and extend the basic rules.
You said
I'm trying to provide tooling to make this easier to deal with at develop time by greying it out in this scenario.
Greying out the corresponding parts might just be done by altering syntax highlighting rules.
See this example for .tt files.
Related
I'm writing a library that has several public classes and methods, as well as several private or internal classes and methods that the library itself uses.
In the public methods I have a null check and a throw like this:
public int DoSomething(int number)
{
if (number == null)
{
throw new ArgumentNullException(nameof(number));
}
}
But then this got me thinking, to what level should I be adding parameter null checks to methods? Do I also start adding them to private methods? Should I only do it for public methods?
Ultimately, there isn't a uniform consensus on this. So instead of giving a yes or no answer, I'll try to list the considerations for making this decision:
Null checks bloat your code. If your procedures are concise, the null guards at the beginning of them may form a significant part of the overall size of the procedure, without expressing the purpose or behaviour of that procedure.
Null checks expressively state a precondition. If a method is going to fail when one of the values is null, having a null check at the top is a good way to demonstrate this to a casual reader without them having to hunt for where it's dereferenced. To improve this, people often use helper methods with names like Guard.AgainstNull, instead of having to write the check each time.
Checks in private methods are untestable. By introducing a branch in your code which you have no way of fully traversing, you make it impossible to fully test that method. This conflicts with the point of view that tests document the behaviour of a class, and that that class's code exists to provide that behaviour.
The severity of letting a null through depends on the situation. Often, if a null does get into the method, it'll be dereferenced a few lines later and you'll get a NullReferenceException. This really isn't much less clear than throwing an ArgumentNullException. On the other hand, if that reference is passed around quite a bit before being dereferenced, or if throwing an NRE will leave things in a messy state, then throwing early is much more important.
Some libraries, like .NET's Code Contracts, allow a degree of static analysis, which can add an extra benefit to your checks.
If you're working on a project with others, there may be existing team or project standards covering this.
If you're not a library developer, don't be defensive in your code
Write unit tests instead
In fact, even if you're developing a library, throwing is most of the time: BAD
1. Testing null on int must never be done in c# :
It raises a warning CS4072, because it's always false.
2. Throwing an Exception means it's exceptional: abnormal and rare.
It should never raise in production code. Especially because exception stack trace traversal can be a cpu intensive task. And you'll never be sure where the exception will be caught, if it's caught and logged or just simply silently ignored (after killing one of your background thread) because you don't control the user code. There is no "checked exception" in c# (like in java) which means you never know - if it's not well documented - what exceptions a given method could raise. By the way, that kind of documentation must be kept in sync with the code which is not always easy to do (increase maintenance costs).
3. Exceptions increases maintenance costs.
As exceptions are thrown at runtime and under certain conditions, they could be detected really late in the development process. As you may already know, the later an error is detected in the development process, the more expensive the fix will be. I've even seen exception raising code made its way to production code and not raise for a week, only for raising every day hereafter (killing the production. oops!).
4. Throwing on invalid input means you don't control input.
It's the case for public methods of libraries. However if you can check it at compile time with another type (for example a non nullable type like int) then it's the way to go. And of course, as they are public, it's their responsibility to check for input.
Imagine the user who uses what he thinks as valid data and then by a side effect, a method deep in the stack trace trows a ArgumentNullException.
What will be his reaction?
How can he cope with that?
Will it be easy for you to provide an explanation message ?
5. Private and internal methods should never ever throw exceptions related to their input.
You may throw exceptions in your code because an external component (maybe Database, a file or else) is misbehaving and you can't guarantee that your library will continue to run correctly in its current state.
Making a method public doesn't mean that it should (only that it can) be called from outside of your library (Look at Public versus Published from Martin Fowler). Use IOC, interfaces, factories and publish only what's needed by the user, while making the whole library classes available for unit testing. (Or you can use the InternalsVisibleTo mechanism).
6. Throwing exceptions without any explanation message is making fun of the user
No need to remind what feelings one can have when a tool is broken, without having any clue on how to fix it. Yes, I know. You comes to SO and ask a question...
7. Invalid input means it breaks your code
If your code can produce a valid output with the value then it's not invalid and your code should manage it. Add a unit test to test this value.
8. Think in user terms:
Do you like when a library you use throws exceptions for smashing your face ? Like: "Hey, it's invalid, you should have known that!"
Even if from your point of view - with your knowledge of the library internals, the input is invalid, how you can explain it to the user (be kind and polite):
Clear documentation (in Xml doc and an architecture summary may help).
Publish the xml doc with the library.
Clear error explanation in the exception if any.
Give the choice :
Look at Dictionary class, what do you prefer? what call do you think is the fastest ? What call can raises exception ?
Dictionary<string, string> dictionary = new Dictionary<string, string>();
string res;
dictionary.TryGetValue("key", out res);
or
var other = dictionary["key"];
9. Why not using Code Contracts ?
It's an elegant way to avoid the ugly if then throw and isolate the contract from the implementation, permitting to reuse the contract for different implementations at the same time. You can even publish the contract to your library user to further explain him how to use the library.
As a conclusion, even if you can easily use throw, even if you can experience exceptions raising when you use .Net Framework, that doesn't mean it could be used without caution.
Here are my opinions:
General Cases
Generally speaking, it is better to check for any invalid inputs before you process them in a method for robustness reason - be it private, protected, internal, protected internal, or public methods. Although there are some performance costs paid for this approach, in most cases, this is worth doing rather than paying more time to debug and to patch the codes later.
Strictly Speaking, however...
Strictly speaking, however, it is not always needed to do so. Some methods, usually private ones, can be left without any input checking provided that you have full guarantee that there isn't single call for the method with invalid inputs. This may give you some performance benefit, especially if the method is called frequently to do some basic computation/action. For such cases, doing checking for input validity may impair the performance significantly.
Public Methods
Now the public method is trickier. This is because, more strictly speaking, although the access modifier alone can tell who can use the methods, it cannot tell who will use the methods. More over, it also cannot tell how the methods are going to be used (that is, whether the methods are going to be called with invalid inputs in the given scopes or not).
The Ultimate Determining Factor
Although access modifiers for methods in the code can hint on how to use the methods, ultimately, it is humans who will use the methods, and it is up to the humans how they are going to use them and with what inputs. Thus, in some rare cases, it is possible to have a public method which is only called in some private scope and in that private scope, the inputs for the public methods are guaranteed to be valid before the public method is called.
In such cases then, even the access modifier is public, there isn't any real need to check for invalid inputs, except for robust design reason. And why is this so? Because there are humans who know completely when and how the methods shall be called!
Here we can see, there is no guarantee either that public method always require checking for invalid inputs. And if this is true for public methods, it must also be true for protected, internal, protected internal, and private methods as well.
Conclusions
So, in conclusion, we can say a couple of things to help us making decisions:
Generally, it is better to have checks for any invalid inputs for robust design reason, provided that performance is not at stake. This is true for any type of access modifiers.
The invalid inputs check could be skipped if performance gain could be significantly improved by doing so, provided that it can also be guaranteed that the scope where the methods are called are always giving the methods valid inputs.
private method is usually where we skip such checking, but there is no guarantee that we cannot do that for public method as well
Humans are the ones who ultimately use the methods. Regardless of how the access modifiers can hint the use of the methods, how the methods are actually used and called depend on the coders. Thus, we can only say about general/good practice, without restricting it to be the only way of doing it.
The public interface of your library deserves tight checking of preconditions, because you should expect the users of your library to make mistakes and violate the preconditions by accident. Help them understand what is going on in your library.
The private methods in your library do not require such runtime checking because you call them yourself. You are in full control of what you are passing. If you want to add checks because you are afraid to mess up, then use asserts. They will catch your own mistakes, but do not impede performance during runtime.
Though you tagged language-agnostic, it seems to me that it probably doesn't exist a general response.
Notably, in your example you hinted the argument: so with a language accepting hinting it'll fire an error as soon as entering the function, before you can take any action.
In such a case, the only solution is to have checked the argument before calling your function... but since you're writing a library, that cannot have sense!
In the other hand, with no hinting, it remains realistic to check inside the function.
So at this step of the reflexion, I'd already suggest to give up hinting.
Now let's go back to your precise question: to what level should it be checked?
For a given data piece it'd happen only at the highest level where it can "enter" (may be several occurrences for the same data), so logically it'd concern only public methods.
That's for the theory. But maybe you plan a huge, complex, library so it might be not easy to ensure having certainty about registering all "entry points".
In this case, I'd suggest the opposite: consider to merely apply your controls everywhere, then only omit it where you clearly see it's duplicate.
Hope this helps.
In my opinion you should ALWAYS check for "invalid" data - independent whether it is a private or public method.
Looked from the other way... why should you be able to work with something invalid just because the method is private? Doesn't make sense, right? Always try to use defensive programming and you will be happier in life ;-)
This is a question of preference. But consider instead why are you checking for null or rather checking for valid input. It's probably because you want to let the consumer of your library to know when he/she is using it incorrectly.
Let's imagine that we have implemented a class PersonList in a library. This list can only contain objects of the type Person. We have also on our PersonList implemented some operations and therefore we do not want it to contain any null values.
Consider the two following implementations of the Add method for this list:
Implementation 1
public void Add(Person item)
{
if(_size == _items.Length)
{
EnsureCapacity(_size + 1);
}
_items[_size++] = item;
}
Implementation 2
public void Add(Person item)
{
if(item == null)
{
throw new ArgumentNullException("Cannot add null to PersonList");
}
if(_size == _items.Length)
{
EnsureCapacity(_size + 1);
}
_items[_size++] = item;
}
Let's say we go with implementation 1
Null values can now be added in the list
All opoerations implemented on the list will have to handle theese null values
If we should check for and throw a exception in our operation, consumer will be notified about the exception when he/she is calling one of the operations and it will at this state be very unclear what he/she has done wrong (it just wouldn't make any sense to go for this approach).
If we instead choose to go with implementation 2, we make sure input to our library has the quality that we require for our class to operate on it. This means we only need to handle this here and then we can forget about it while we are implementing our other operations.
It will also become more clear for the consumer that he/she is using the library in the wrong way when he/she gets a ArgumentNullException on .Add instead of in .Sort or similair.
To sum it up my preference is to check for valid argument when it is being supplied by the consumer and it's not being handled by the private/internal methods of the library. This basically means we have to check arguments in constructors/methods that are public and takes parameters. Our private/internal methods can only be called from our public ones and they have allready checked the input which means we are good to go!
Using Code Contracts should also be considered when verifying input.
This is very dangerous so I wonder why it's allowed. Since I often need to switch between VB.NET and C# I sometimes add breakpoint-conditions like following:
foo = "bah"
I want to stop if the string variable foo is "bah, so the correct way was to use foo == "bah" instead of foo = "bah".
But it "works". You don't get any warnings or errors at compile- or runtime. But actually this modifies the variable foo, it makes it always "bah" even if it had a different value. Since that happens silently (the breakpoint never gets hit) it is incredibly dangerous.
Why is it allowed? Where is my error in reasoning (apart from confusing the C# and VB.NET syntax)? In C# (as opposed to VB.NET) an assignment statement returns the value that was assigned, so not a bool, but a string in this case. But a breakpoint condition has to be a bool if you check the box "Is True".
Here is a little sample "program" and screenshots from my (german) IDE:
static void Main()
{
string foo = "foo";
// breakpoint with assignment(foo = "bah") instead of comparison(foo == "bah"):
Console.WriteLine(foo); // bah, variable is changed from the breakpoint window
}
The breakpoint-condition dialog:
The code as image including the breakpoint:
It is an automatic consequence of C# syntax, common in the curly-braces language group. An assignment is also an expression, its result is the value of the right-hand side operand. The debugger does not object either to expressions having side-effects, nor would it be simple at all to suppress them. It could be blamed for not checking that the expression has a bool result, the debugger however does not have a full-blown C# language parser. This might well be fixed in VS2015 thanks to the Roslyn project. [Note: see addendum at the bottom].
Also the core reason that the curly-brace languages need a separate operator for equality, == vs =. Which in itself must be responsible for a billion dollar worth of bugs, every C programmer makes that mistake at least once.
VB.NET is different, assignment is a statement and the = token is valid for both assignment and comparison. You can tell from the debugger, it picks the equality operator instead and doesn't modify the variable.
Do keep in mind that this is actually useful. It lets you temporarily work around a bug, forcing the variable value and allowing you to continue debugging and focus on another problem. Or create a test condition. That's pretty useful. In a previous life-time, I wrote a compiler and debugger and implemented "trace points". Discovered the same scenario by accident and left it in place. It ran in a host that relied heavily on state machines, overriding the state variable while debugging was incredibly useful. The accident, no, not so useful :)
A note about what other SO users are observing, it depends on the debugging engine that you use. The relevant option in VS2013 is Tools + Options, Debugging, General, "Use Managed Compatibility Mode" checkbox. Same option exists in VS2012, it had a slightly different name (don't remember). When ticked you get an older debugging engine, one that is still compatible with C++/CLI. Same one as used in VS2010.
So that's a workaround for VS2013, untick the option to get the debugger to check that the expression produces a bool result. You get some more goodies with that new debugging engine, like seeing method return values and Edit+Continue support for 64-bit processes.
I can disagree about buggy nature of this behavior. Few examples where for debugging purposes in my life if was useful:
1. Other thread modifies something in your code.
2. Other service updated value in db
So I suppose for cases of synchronization it can be useful feature, but I agree that it can cause problems
In Visual Studio 2019, variable assignment expressions in Breakpoint conditions no longer work.
The expression will be evaluated for value, but side effects such as variable assignment are discarded.
https://developercommunity.visualstudio.com/content/problem/715716/variables-cannot-be-assigned-in-breakpoint-conditi.html
Is there any solution similar to [PostSharp] - [Infuse - A Precompiler for C#] that let me modify code at compile time ?
The below is a pseudo code.
[InterceptCallToConstructors]
void Method1(){
Person Eric = new Person("Eric Bush");
}
InterceptCallToConstructors(ConstructorMethodArgs args){
if(args.Type == typeof(Person))
if(PersonInstances++ > 10 ) args.ReturnValue = null;
}
In this example we see the Eric should not contain a new Person class if more than 10 Person are created.
After some research I found two solution PostSharp and Infuse.
With Infuse it's very complicated and hard to detect how many instance of Person are made how ever with PostSharp it's one line code to detect.
I have tried to go AOP with PostSharp but PostSharp currently doesn't support to intercept Call To Constructor Aspect.
As far as I read Roslyn doesn't support to modify code at compile time.
This would be a "custom preprocessor" answer, that modifies the source code to achieve OP's effect.
Our DMS Software Reengineering Toolkit with its C# Front End can do this.
DMS provides for source to source transformations, with the transformations coded as
if you see *this*, replace it by *that*
This is written in the form:
rule xxx pattern_parameters
this_pattern
-> that_pattern ;
The "->" is pronounced "replace by: :-}
DMS operates on ASTs, so includes a parsing step (text to ASTs), a tree transformation step, and a prettyprinting step that produces the final answer (ASTs to text).
OP's seems to want to modify the constructor call site (he can't modify the constructor; there's no way to get it to return "null"). To accomplish OP's, task, he would provide DMS the following source-to-source transformation specification:
default domain CSharp~v5; -- says we are working with C# syntax (and need the C# front end)
rule intercept_constructor(c: IDENTIFIER, a:arguments): expression
" new \c (\a) "
-> " \c.PersonInstances==10?null:(PersonInstances++,new \c (\a)) "
if c == "Person"; -- one might want to force c to be on some qualified path
What the rule does is find matching constructor call syntax of arbitrary form, and replace it by a conditional expression that check's OP's precondition, returning null if there are too many Person instances (we fix a bug in OP's spec here; he appears to increment the count whether new Person instance is created or not, surely not his intention). We have to qualify the PersonInstance's location; it can't just be floating around in the ether. In this example I'm proposing it is a static member of the class.
The details: each rule has a name ("intercept_constructor", stolen from OP). It refers to a syntactic category ("expression") with syntactic shape "new \c (\a)", forcing it to match only constructor calls that are expressions. The quotes in the rule are meta-quotes; they distinguish the syntax of the rule language from the syntax of the targeted language (C# in this case). The backslashes are meta-escapes; \c in meta-quotes is the same think in the rule as c outside the meta-quotes, similarly for \a.
In a really big system there may be several Person classes. We want to make sure we get right one; one might need to qualify the referenced class as being a specific by by providing a path. OP hints at this with the annotation. If one wanted to check that an annotation existed on the containing method, one would need custom special predicate to ask for that. DMS provides complete facilities for coding such a predicate, including complete access the the AST, so the predicate can climb up or down in its search for a matching annotation.
If you're running on top of the KRuntime (-> ASP.NET 5) you can hook into the compilation by implementing the ICompileModule assembly neutral interface.
I'd recommend loooking at:
the aop example in the repo
this nice writeup
Just as the question says, I'm stuck with .Net 2.0 here in my company and there is no chance of upgrading. Is there a way to make Irony work in .NET 2.0.
I'll briefly try to explain what I'm trying to achieve. We at our company have a payroll system where we let customer define there own formulas for payslip items. These formulas must use some keywords/functions which we check through Regex(very unreliable). Then we create a Microsoft script control object and use its eval function to parse the snippet. These snippets in turn call keywords/functions defined in class to get the output.
But to say this system has drawback is being very polite. Sometimes the it just overflows with error.
A very silly e.g. -
FVAL(A)/(FVAL(B)+FVAL(C))
Obviously there the tow stages this formula passes through.
First the validation stage where we do the following -
Check through regex whether there are any keywords used which are not present in predefined arrayList of keywords
Then check whether there are any variables(in this case - A, B, C are the ones) used which are not present in an arraylist of variable.
If all validation passes then we pass the formula string to a ScriptControlClass object's eval function and also add a static class with all functions defined in it(in this case consider FVAL()) and this function returns boolean true if the variable is a valid one to use in this context(like adding a date and number would return false)
Now after evaluating this expression the result is - DivideByZeroException, why because eval typecasts all boolean true to 1 and performs normal arithmetic to it.
FVAL(A)/(FVAL(B)+FVAL(C)) => 1/(1-1)
The second phase is almost similar but without any validation check, like without returning true we actually return the value of the variable from a DataTable which consists of values for each variables.
The system is antiquated and its seriously getting on my nerves and I really need help. Please suggest.
If you're just talking about excel style formulas, it's pretty easy to write a good interpreter, especially if you only have to deal with one data type. Someone with a background in compilers (or even parsers) could probably write you up on in less than a day. I've written a bunch myself.
Embedding a scripting language is another option, but you'd have to be comfortable with the quirks of the given language. Ensuring that high-precision numbers are used could be difficult, if that's a business requirement.
If I do this I get a System.StackOverflowException:
private string abc = "";
public string Abc
{
get
{
return Abc; // Note the mistaken capitalization
}
}
I understand why -- the property is referencing itself, leading to an infinite loop. (See previous questions here and here).
What I'm wondering (and what I didn't see answered in those previous questions) is why doesn't the C# compiler catch this mistake? It checks for some other kinds of circular reference (classes inheriting from themselves, etc.), right? Is it just that this mistake wasn't common enough to be worth checking for? Or is there some situation I'm not thinking of, when you'd want a property to actually reference itself in this way?
You can see the "official" reason in the last comment here.
Posted by Microsoft on 14/11/2008 at
19:52
Thanks for the suggestion for
Visual Studio!
You are right that we could easily
detect property recursion, but we
can't guarantee that there is nothing
useful being accomplished by the
recursion. The body of the property
could set other fields on your object
which change the behavior of the next
recursion, could change its behavior
based on user input from the console,
or could even behave differently based
on random values. In these cases, a
self-recursive property could indeed
terminate the recursion, but we have
no way to determine if that's the case
at compile-time (without solving the
halting problem!).
For the reasons above (and the
breaking change it would take to
disallow this), we wouldn't be able to
prohibit self-recursive properties.
Alex Turner
Program Manager
Visual C# Compiler
Another point in addition to Alex's explanation is that we try to give warnings for code which does something that you probably didn't intend, such that you could accidentally ship with the bug.
In this particular case, how much time would the warning actually save you? A single test run. You'll find this bug the moment you test the code, because it always immediately crashes and dies horribly. The warning wouldn't actually buy you much of a benefit here. The likelihood that there is some subtle bug in a recursive property evaluation is low.
By contrast, we do give a warning if you do something like this:
int customerId;
...
this.customerId= this.customerId;
There's no horrible crash-and-die, and the code is valid code; it assigns a value to a field. But since this is nonsensical code, you probably didn't mean to do it. Since it's not going to die horribly, we give a warning that there's something here that you probably didn't intend and might not otherwise discover via a crash.
Property referring to itself does not always lead to infinite recursion and stack overflow. For example, this works fine:
int count = 0;
public string Abc
{
count++;
if (count < 1) return Abc;
return "Foo";
}
Above is a dummy example, but I'm sure one could come up with useful recursive code that is similar. Compiler cannot determine if infinite recursion will happen (halting problem).
Generating a warning in the simple case would be helpful.
They probably considered it would unnecessary complicate the compiler without any real gain.
You will discover this typo easily the first time you call this property.
First of all, you'll get a warning for unused variable abc.
Second, there is nothing bad in teh recursion, provided that it's not endless recursion. For example, the code might adjust some inner variables and than call the same getter recursively. There is however for the compiler no easy way at all to prove that some recursion is endless or not (the task is at least NP). The compiler could catch some easy cases, but then the consumers would be surprised that the more complicated cases get through the compiler's checks.
The other cases cases that it checks for (except recursive constructor) are invalid IL.
In addition, all of those cases, even recursive constructors) are guarenteed to fail.
However, it is possible, albeit unlikely, to intentionally create a useful recursive property (using if statements).