Compare 2 method including comments trivia with Roslyn - c#

I need to perform comparison between 2 methods in c# code,
I found the SyntaxNode.IsEquivalentTo, but in case the methods are:
public void Method1()
{
//hello
}
and
public void Method1()
{
}
the return value is : True.
is there any other way with Rslyn API to perform comparison including comments trivia??
(and in the example above to get: False??)
(The reason i'm not using regular string comparison is that i wan't that spaces and new line will not count as difference, for example:
public void Method1()
{
int i=1;
}
and
public void Method1(){
int i=1 ;
}
will be equal.

At the time of writing this, there is no built-in method to support such comparison, but it is easy to write one using syntax rewriters.
The basic idea is very simple. Write a CSharpSyntaxRewriter that will remove all the non-comment trivia from both of the compared nodes and then compare the newly created nodes by using the built in IsEquivalentTo() method.
The below code does what you are looking for. To compare two nodes (MethodDeclarationSyntax in your case) just call:
firstNode.IsEquivalentToWithCommentsPreserved(secondNode);
Here is the implementation:
public static class SyntaxNodeExtensions
{
public static bool IsEquivalentToWithCommentsPreserved(this SyntaxNode syntaxNode, SyntaxNode otherNode)
{
var triviaRemover = new NonCommentTriviaRemover();
return triviaRemover.Visit(syntaxNode)
.IsEquivalentTo(triviaRemover.Visit(otherNode));
}
private class NonCommentTriviaRemover : CSharpSyntaxRewriter
{
private static readonly SyntaxTrivia EmptyTrivia = default(SyntaxTrivia);
public override SyntaxTrivia VisitTrivia(SyntaxTrivia trivia)
{
return trivia.IsKind(SyntaxKind.SingleLineCommentTrivia) ||
trivia.IsKind(SyntaxKind.MultiLineCommentTrivia)
? trivia // Preserve comments by returning the original comment trivia.
: EmptyTrivia; // Remove all other trivias.
}
}
}
Keep in mind that this code does not ignore eventual differences in trivias in comments. That means, these two versions of the methods will be considered as not being equivalent:
void Method()
{
// Some comment.
}
void Method()
{
// Some comment.
}
Let me know if you maybe need to ignore these differences as well. I can then extend the solution to cover that case as well.
I quickly tried the solution on the following non-trivial example and it worked fine:
var firstCode =
#"
// First comment.
// Second comment.
int x(int a)
{
// This is a comment.
// And this as well.
if (a == 1) // This also
{
return 0 ;
}
/*
Multi line comment.
*/if(a == -5) return -10 ;
if (a == 2)
return 0 ;
return 5;
}
";
var secondCode =
#"
// First comment.
// Second comment.
int x(int a)
{
// This is a comment.
// And this as well.
if (a
== 1) // This also
{
return 0 ;
}
/*
Multi line comment.
*/
if(a == -5) return -10 ;
if (a == 2) return 0 ;
return 5;
}
";
var firstMethod = CSharpSyntaxTree.ParseText(firstCode).GetRoot().DescendantNodes().OfType<MethodDeclarationSyntax>().First();
var secondMethod = CSharpSyntaxTree.ParseText(secondCode).GetRoot().DescendantNodes().OfType<MethodDeclarationSyntax>().First();
Console.WriteLine($"{firstMethod.IsEquivalentTo(secondMethod)}"); // Prints false.
Console.WriteLine($"{firstMethod.IsEquivalentToWithCommentsPreserved(secondMethod)}"); // Prints true.
Still, before using the code in the production it would be good to write proper unit test for it, null-checks etc. ;-)

If they need to be exactly the same, you could simply call node.ToString() on the nodes and compare the strings.

Related

Is there an integer i where i<2 && i>10?

if(i<2 && i>10){
//code here will never be reached or what?
}
Just in case of an integer overflow maybe?
Edit: I wrote this not knowing c# is the language used. I've used C++ but I believe that the principle is also valid for c#.
there is no single integer that satisfies the condition
the compiler may well optimize away the body of the if condition (see here for an example on compiler explorer)
however, in the case of i being volatile it is possible that the value of i changes between the i<2 and the i>10 tests. In this case the if body can be reached.
However, though it may be theoretically possible it is highly unlikely that this was the intention.
Here's my example code
#include <iostream>
using std::cout;
void foo(int i)
{
if (i < 2 && i > 10)
{
cout << "The impossible happened in foo\n";
}
}
void bar(volatile int i)
{
if (i < 2 && i > 10)
{
cout << "The impossible happened in bar\n";
}
}
It is indeed possible for some c# (assuming c# because it's tagged... not assuming integer even if it's tagged because the right-hand comparison is still an integer, so it matches the tags! ;-) ) code to go into that if... take:
public class Foo {
public static bool operator> (Foo a, int b) {
return true;
}
public static bool operator< (Foo a, int b) {
return true;
}
}
Then:
Foo i = new Foo();
if(i<2 && i>10){
Console.WriteLine("Pass!");
}
Guess the output? Check it out here
Another way, with no extra classes or operator overloading:
private static bool odd;
public static int i { get { odd = !odd; return odd ? 1 : 11; } }
Check it out
Otherwise, it could also happen if multithreading (if the value of i changes bewtween the comparisons) unless you apply correct locking

C# - Limiting scope to ternary statement

EDIT: Originally, this post's example had dealt with hash codes, so you will see some comments using param.GetHashCode(), rather than (1+param). To get more to the point, I have changed the functions to calculate one plus the absolute value of some number.
Let's say that I want to create a function that calculates the absolute value of some integer (without using Math.Abs). I could write something similar to:
int absoluteValueOfOnePlus(int param)
{
int onePlusParam= 1 + param;
return ((onePlusParam> 0) ? (onePlusParam) : (-onePlusParam) );
}
I'm looking to limit the scope of onePlusParm to within the ternary statement--something similar to:
int absoluteValueOfOnePlus(intparam)
{
return (((int onePlusParam = 1 + param) > 0) ? (onePlusParam) : (-onePlusParam) );
}
I understand that this is not valid C#, but it proves a good example for what I'm trying to perform--create some variable which exists only in the scope of a ternary operator.
The parts of a ternary expression are expressions. If the language designers were to allow what you're asking for, they would probably do it for all expressions rather than just for ternary expressions. You would then also be able to do if ((int n = foo()) != 0) bar(n);.
In C#, declarations are statements, not expressions. So the answer is no, you can't do this. However, the for statement can take a declaration, so the closest you can get to a single statement is this:
for (int i = param.GetHashCode();;)
return (i > 0) ? i : -i;
which is technically a single statement, albeit a compound one, and on two lines. But that looks awful code and I wouldn't write it like that.
If your main concern is minimizing the scope of i, then use a small scope for it:
int positiveHash(string param)
{
// Some statements here...
// ...
// Start a small scope
{
int i = param.GetHashCode();
if (...)
return ((i > 0) ? (i) : (-i) );
}
// Some more C# statements here.
// i is out of scope here.
}
I would simply write:
int GetPositiveHash(string param)
{
return Math.Abs(param.GetHashCode());
}
or
int GetPositiveHash(string param)
{
int hashCode = param.GetHashCode();
return Math.Abs(hashCode);
}
The aids readability, maintainability and more importantly in this case avoid premature optimization which is the root of all evil.
If you are really worried about performance then profile you code and see where your biggest bottlenecks are. I'd be surprised if GetPosiitiveHash() is causing the biggest bottleneck.
You might like to have a look at the .Net Framework source code for String.GetHashCode(). You'll see that a ternary operator is going to have quite a minimal saving compared what going on inside the GetHashCode() method.
It's worth remembering:
The full version of the quote is "We should forget about small
efficiencies, say about 97% of the time: premature optimization is the
root of all evil." and I agree with this. Its usually not worth
spending a lot of time micro-optimizing code before its obvious where
the performance bottlenecks are.
from The fallacy of premature optimization
You could substitute having a data variable (i) in scope to having a function variable in scope. The advantage is a function is more likely to be written only once and not likely to be misused.
int positiveHash(string param)
{
Func<int, int> absoluteValue = i => (i > 0) ? i : -1;
return absoluteValue(param.GetHashCode());
}
And my attempt
static int positiveHash(string param)
{
return new List<string>() {param}.Select(s => s.GetHashCode()).Select(i => (i > 0) ? (i) : (-i)).Single();
}
(Of course your code (and mine) is bad,you need to split your method into 2 smaller ones)
and the updated question
static int absoluteValueOfOnePlus(int intparam)
{
return new List<int> { intparam }.Select(n => n + 1).Select(i => (i > 0) ? (i) : (-i)).Single();
}
Besides just creating a new block you could also use the built in Absolute value function Math.Abs(...) or define your own lambda/function;
...built in ...
public static int hash(string param)
{
return Math.Abs(param.GetHashCode());
}
... lambda ...
static Func<int, int> abs = i => i > 0 ? i : -i;
public static int hash(string param)
{
return abs(param.GetHashCode());
}
... static function ...
static int Abs(int i)
{
return i > 0 ? i : -i;
}
public static int hash(string param)
{
return Abs(param.GetHashCode());
}

c# Using ArrayLists inside properties

I want to be able to have a class Named Musician, with a rule/Property called Hits( which is an Array-list, with two methods called ListHits() and AddHits(string)
ListHits returns a string containing all the hits separa
ted by a comma
AddHit – adds a hit to the Hits arrayList. Each hit is
a string between 1 and 50 characters long with no l
eading
or trailing white space.
I have no idea how to go about doing this im familiar with collections and adding values to Lists and i know how to set basic properties
-- i have tried for hours on end please HELP!
public class Musician : Celebrity
{
private string _hits;
public string Hits
{
get { return _hits; }
set
{
if (value.Length < 1)
{
throw new Exception("need more then 2 characters");
}
if (value.Length > 50)
{
throw new Exception("needs to be less then 50 characters");
}
else
{
_hits = value.Trim();
}
}
}
public Musician()
{
//
// TODO: Add constructor logic here
//
}
}
First off, you should try using a List<string> rather than an ArrayList. ArrayList was what you used before C# added generics in version 2.0. List<T> allows you to retain typing information about the items in the list, which enables you to more easily write correct code.
The code you posted didn't seem to really match the details you were asking for, but something like this should do what you specified:
public class Musician
{
private List<string> _hits;
public string ListHits()
{
return string.Join(", ", _hits);
}
public void AddHit(string hit)
{
/*
* validate the hit
*/
_hits.Add(hit);
}
}
The key is using string.Join to convert the _hits list into a comma-delimited string. From there, the rest is just basic C# concepts.

Unit Testing Brain Freeze

I have a class method that looks like this:
private List<string> DataStoreContents = new List<string>(new[] { "", "", "", "" });
public void InputDataStore(int DataStore, string Data)
{
DataStoreContents[DataStore - 1] = Data;
}
I want to make sure that DataStore is >=1 and <= 4
How can I write a unit test that ensures that?
Either
Assert.IsTrue(DataStore >= 1 && DataStore <= 4);
or, if you prefer the fluent interface
Assert.That(DataStore, Is.GreaterThanOrEqualTo(1).And.LessThanOrEqualTo(4));
[EDIT - in response to you clarification above]
It sounds like you want to have some sort of barrier checking to check that the supplied values are in range.
In this case, you have a few choices:
Philip Fourie has given an answer involving code contracts.
Another simple approach is to write the barrier check yourself:
public void InputDataStore(int DataStore, string Data)
{
if (DataStore < 1 || DataStore > 4)
{
throw new ArgumentOutOfRangeException("DataStore", "Must be in the range 1-4 inc.");
}
DataStoreContents[DataStore - 1] = Data;
}
If you don't want to throw an exception, but maybe want to log it and exit cleanly:
public void InputDataStore(int DataStore, string Data)
{
if (DataStore < 1 || DataStore > 4)
{
// log something here and then return
return;
}
DataStoreContents[DataStore - 1] = Data;
}
To link back to unit testing. A unit test, for example, could be a test you write to check that when InputDataStore is called with a value that is out of range, that it throws an expcetion. Another would be that when it is called with a value in range, it doesn't throw an exception, and it updates DataStoreContents correctly.
Assert.IsTrue(DataStore >= 1 && DataStore <= 4);
Perhaps this? (should all fail until you fix)
[Test]
[TestCase(5)]
[TestCase(0)]
[TestCase(int.MaxValue)]
[TestCase(int.MinValue)]
public void InvalidIndices(int index)
{
Assert.DoesNotThrow(() => yourObj.InputDataStore(index, "don't care"));
}
or (should all pass)
[Test]
[TestCase(5)]
[TestCase(0)]
[TestCase(int.MaxValue)]
[TestCase(int.MinValue)]
public void InvalidIndices(int index)
{
Assert.Throws<IndexOutOfRangeException>(() => yourObj.InputDataStore(index, "don't care"));
}
You can also use a code contract with a lot of other benefits such a static code checking.
This means that you will be warned during 'code time' about using the method incorrectly.
public void InputDataStore(int DataStore, string Data)
{
Contract.Requires(DataStore >= 1 && DataStore <= 4);
DataStoreContents[DataStore - 1] = Data;
}
A good read here: http://devjourney.com/blog/code-contracts-part-1-introduction/
I think you cannot really "test" here.
You can insert a check, which will be executed at runtime. Said check might help but it will not be that much more helpful than the ArrayOutOfBoundsException you'd get anyway...
Also, inserting a check is not the same thing as testing.
You should look at the Callers of the InputDataStore Function.
These you can test: Create some different situations, execute the callers and check whether they pass the right value to InputDataStore.

How does Assert.AreEqual determine equality between two generic IEnumerables?

I have a unit test to check whether a method returns the correct IEnumerable. The method builds the enumerable using yield return. The class that it is an enumerable of is below:
enum TokenType
{
NUMBER,
COMMAND,
ARITHMETIC,
}
internal class Token
{
public TokenType type { get; set; }
public string text { get; set; }
public static bool operator == (Token lh, Token rh) { return (lh.type == rh.type) && (lh.text == rh.text); }
public static bool operator != (Token lh, Token rh) { return !(lh == rh); }
public override int GetHashCode()
{
return text.GetHashCode() % type.GetHashCode();
}
public override bool Equals(object obj)
{
return this == (Token)obj;
}
}
This is the relevant part of the method:
foreach (var lookup in REGEX_MAPPING)
{
if (lookup.re.IsMatch(s))
{
yield return new Token { type = lookup.type, text = s };
break;
}
}
If I store the result of this method in actual, make another enumerable expected, and compare them like this...
Assert.AreEqual(expected, actual);
..., the assertion fails.
I wrote an extension method for IEnumerable that is similar to Python's zip function (it combines two IEnumerables into a set of pairs) and tried this:
foreach(Token[] t in expected.zip(actual))
{
Assert.AreEqual(t[0], t[1]);
}
It worked! So what is the difference between these two Assert.AreEquals?
Found it:
Assert.IsTrue(expected.SequenceEqual(actual));
Have you considered using the CollectionAssert class instead...considering that it is intended to perform equality checks on collections?
Addendum:
If the 'collections' being compared are enumerations, then simply wrapping them with 'new List<T>(enumeration)' is the easiest way to perform the comparison. Constructing a new list causes some overhead of course, but in the context of a unit test this should not matter too much I hope?
Assert.AreEqual is going to compare the two objects at hand. IEnumerables are types in and of themselves, and provide a mechanism to iterate over some collection...but they are not actually that collection. Your original comparison compared two IEnumerables, which is a valid comparison...but not what you needed. You needed to compare what the two IEnumerables were intended to enumerate.
Here is how I compare two enumerables:
Assert.AreEqual(t1.Count(), t2.Count());
IEnumerator<Token> e1 = t1.GetEnumerator();
IEnumerator<Token> e2 = t2.GetEnumerator();
while (e1.MoveNext() && e2.MoveNext())
{
Assert.AreEqual(e1.Current, e2.Current);
}
I am not sure whether the above is less code than your .Zip method, but it is about as simple as it gets.
I think the simplest and clearest way to assert the equality you want is a combination of the answer by jerryjvl and comment on his post by MEMark - combine CollectionAssert.AreEqual with extension methods:
CollectionAssert.AreEqual(expected.ToArray(), actual.ToArray());
This gives richer error information than the SequenceEqual answer suggested by the OP (it will tell you which element was found that was unexpected). For example:
IEnumerable<string> expected = new List<string> { "a", "b" };
IEnumerable<string> actual = new List<string> { "a", "c" }; // mismatching second element
CollectionAssert.AreEqual(expected.ToArray(), actual.ToArray());
// Helpful failure message!
// CollectionAssert.AreEqual failed. (Element at index 1 do not match.)
Assert.IsTrue(expected.SequenceEqual(actual));
// Mediocre failure message:
// Assert.IsTrue failed.
You'll be really pleased you did it this way if/when your test fails - sometimes you can even know what's wrong without having to break out the debugger - and hey you're doing TDD right, so you write a failing test first, right? ;-)
The error messages get even more helpful if you're using AreEquivalent to test for equivalence (order doesn't matter):
CollectionAssert.AreEquivalent(expected.ToList(), actual.ToList());
// really helpful error message!
// CollectionAssert.AreEquivalent failed. The expected collection contains 1
// occurrence(s) of <b>. The actual collection contains 0 occurrence(s).

Categories

Resources