Compare contents of System.Text.JsonDocument C# in unit test - c#

I feel like I'm taking crazy pills here... but how can I compare two JsonDocuments (or JsonElements or what have you) for JSON equality, i.e. same keys with same values?
To limit the scope, the ability to perform this comparison in a unit test is sufficient unto the day.
I have a unit test in which I want to compare the result of a function that returns a JsonDocument to some expected value. Things that don't work include using FluentAssertions.Json (because my type is not JObject), and comparing the value of GetRawText because I don't care about the whitespace.
I guess I could write the strings out and re-serialize them or something but this honestly feels like such a hack that I must be doing something wrong.
I understand the business logic of comparing them, I have seen the other questions like this and this. The first is much more in keeping with what I want, it's just an embarrassing result for C#...
The second is not what I need at all.

Thanks to #ChristophLütjen for identifying the fix I needed in this issue
Given two JsonDocuments I can compare them using FluentAssertions like this:
Doc1.RootElement.Should().BeEquivalentTo(Doc2.RootElement, opt => opt.ComparingByMembers<JsonElement>());
I'm still bemused that there is not a more straightforward comparison method out there for the general case, but since my immediate use was unit testing, I'm satisfied with this result.

Related

How to assert multiple variables simultaneously with NUnit?

This ought to be possible - and I think it might be possible somehow using Constraints, but they seem to only take one variable as input, which fails.
Here's the simplest example I've run into:
I have an object with size (i.e. a 2D variable)
I need to show it's partly off-screen
This happens if x is outside a range OR y is outside a range
... I can't see how to achieve that in NUnit (without throwing away critical info)
It seems that NUnit fundamentally doesn't support anything other than 1-dimensional problems, which would be absurd, so I must be missing something here. But the only methods I can find all only work with 1-d inputs.
Things I've thought of ... but don't work:
i. Assert.True( A | B ) - useless: it throws away all the "expected" info and generates such weak messages that there's essentially no point using a testing framework here
ii. Assert.Or - doesn't allow multiple variables, so you can test "X is 3, or 4, or 5", but you cannot test "X is 3, or Y is 4"
iii. Writing custom assertions for a Tuple<A,B> - but then it goes horribly wrong when you get to "greater than" ... is "<A,B> greater than <C,D>" is a non-trivial question and ends up with code that's vastly confusing and I can guarantee someone will misunerstand and misapply it in future (probably me :))
1-dimensional problems
To be frank I don't really get it. If you think of it, actually all members in Assert are tools for boolean problems (zero dimension). Which perfectly makes sense for raising a red or green flag in case of all assertions.
Your problem is also a yes/no question: "Is my object off-screen?".
i. Assert.True( A | B ) - useless: it throws away all the "expected" info and generates such weak messages
Nothing stops you to specify the message, which is displayed if the assertion fails:
Assert.IsTrue(myObject.IsOnScreen(), $"Object is off-screen: {myObject.Bounds}")
But in this particular case you can easily turn it into an equality assertion:
// here also the default message may be clear enough
Assert.AreEqual(expected: screenBounds, actual: screenBounds.UnionWith(myObject.Bounds));
And voila, multiple (four) properties were compared at once...
To be honest, you haven't done a great job of explaining what you want. I re-read several times and I think I have guessed right, but I shouldn't have to so please try code for the next problem.
I take it that you wish it were possible to change the actual value in the course of an NUnit constraint expression... something like this...
// NOT WORKING CODE
Assert.That(X, Is.OnScreen.And.That(Y, Is.OnScreen));
(Where "OnScreen" makes the test you need)
But you can't do that. To work within the existing features of NUnit, each Assertion needs to deal with one actual value.
Simplest thing I can think of is to use multiple assertions:
Assert.Multiple(() =>
{
Assert.That(X.IsOnScreen);
Assert.That(Y.IsOnScreen);
});
I say this is simple because NUnit already does it. It's not very expressive, however.
If you are doing a lot of such tests, the simplest thing is to create a custom constraint, which deals with your objects as an entity. Then you would be able to write something like
Assert.That(myObject, Is.OnScreen);
Not just as pseudo-code, but as actual code. Basically, you would be creating a set of tests on rectangles, representing the object bounds on the one hand and the screen dimensions on the other.

multiple expectations/assertions to verify test result

I have read in several book and articles about TDD and BDD that one should avoid multiple assertions or expectations in a single unit test or specification. And I can understand the reasons for doing so. Still I am not sure what would be a good way to verify a complex result.
Assuming a method under test returns a complex object as a result (e.g. deserialization or database read) how do I verify the result correctly?
1.Asserting on each property:
Assert.AreEqual(result.Property1, 1);
Assert.AreEqual(result.Property2, "2");
Assert.AreEqual(result.Property3, null);
Assert.AreEqual(result.Property4, 4.0);
2.Relying on a correctly implemented .Equals():
Assert.AreEqual(result, expectedResult);
The disadvantage of 1. is that if the first assert fails all the following asserts are not run, which might have contained valuable information to find the problem. Maintainability might also be a problem as Properties come and go.
The disatvantage of 2. is that I seem to be testing more than one thing with this test. I might get false positives or negatives if .Equals() is not implemented correctly. Also with 2. I do not see, what properties are actually different if the test fails but I assume that can often be addressed with a decent .ToString() override. In any case I think I should avoid to be forced to throw the debugger at the failing tests to see the difference. I should see it right away.
The next problem with 2. is that it compares the whole object even though for some tests only some properties might be significant.
What would be a decent way or best practise for this in TDD and BDD.
Don't take TDD advice literally. What the "good guys" mean is that you should test one thing per test (to avoid a test failing for multiple reasons and subsequently having to debug the test to find the cause).
Now test "one thing" means "one behavior" ; NOT one assert per test IMHO.
It's a guideline not a rule.
So options:
For comparing whole data value objects
If the object already exposes a usable production Equals, use it.
Else do not add a Equals just for testing (See also Equality Pollution). Use a helper/extension method (or you could find one in an assertion library) obj1.HasSamePropertiesAs(obj2)
For comparing unstructured parts of objects (set of arbitrary properties - which should be rare),
create a well named private method AssertCustomerDetailsInOrderEquals(params) so that the test is clear in what part you're actually testing. Move the set of assertions into the private method.
With the context present in the question I'd go for option 1.
It likely depends on context. If I'm using some sort of built in object serialization within the .NET framework, I can be reasonably assured that if no errors were encountered then the entire object was appropriately marshaled. In that case, asserting a single field in the object is probably fine. I trust MS libraries to do the right thing.
If you are using SQL and manually mapping results to domain objects I feel that option 1 makes it quicker to diagnose when something breaks than option 2. Option 2 likely relies on toString methods in order to render the assertion failure:
Expected <1 2 null 4.0> but was <1 2 null null>
Now I am stuck trying to figure out what field 4.0/null was. Of course I could put the field name into the method:
Expected <Property1: 1, Property2: 2, Property3: null, Property4: 4.0>
but was <Property1: 1, Property2: 2, Property3: null, Property4: null>
This is fine for small numbers of properties, but begins to break down larger numbers of properties due to wrapping, etc. Also, the toString maintenance could become an issue as it needs to change at the same rate as the equals method.
Of course there is no correct answer, at the end of the day, it really boils down to your team's (or your own) personal preference.
Hope that helps!
Brandon
I would use the second approach by default. You're right, this fails if Equals() is not implemented correctly, but if you've implemented a custom Equals(), you should have unit-tested it too.
The second approach is in fact more abstract and consise and allows you to modify the code easier later, allowing in the same way to reduce code duplication. Let's say you choose the first approach:
You'll have to compare all properties in several places,
If you add a new property to a class, you'll have to add a new assertion in unit tests; modifying your Equals() would be much easier. Of course you have still to add a value of the property in expected result (if it's not the default value), but it would be shorter to do than adding a new assertion.
Also, it is much easier to see what properties are actually different with the second approach. You just run your tests in debug mode, and compare the properties on break.
By the way, you should never use ToString() for this. I suppose you wanted to say [DebuggerDisplay] attribute?
The next problem with 2. is that it compares the whole object even though for some tests only some properties might be significant.
If you have to compare only some properties, than:
ether you refactor your code by implementing a base class which contains only those properties. Example: if you want to compare a Cat to another Cat but only considering the properties common to Dog and other animals, implement Cat : Animal and compare the base class.
or you do what you've done in your first approach. Example: if you do care only about the quantity of milk the expected and the actual cats have drunk and their respective names, you'll have two assertions in your unit test.
Try "one aspect of behavior per test" rather than "one assertion per test". If you need more than one assertion to illustrate the behavior you're interested in, do that.
For instance, your example might be ShouldHaveSensibleDefaults. Splitting that up into ShouldHaveADefaultNameAsEmptyString, ShouldHaveNullAddress, ShouldHaveAQuantityOfZero etc. won't read as clearly. Nor will it help to hide the sensible defaults in another object then do a comparison.
However, I would separate examples where the values had defaults with any properties derived from some logic somewhere, for instance, ShouldCalculateTheTotalQuantity. Moving small examples like this into their own method makes it more readable.
You might also find that different properties on your object are changed by different contexts. Calling out each of these contexts and looking at those properties separately helps me to see how the context relates to the outcome.
Dave Astels, who came up with the "one assertion per test", now uses the phrase "one aspect of behavior" too, though he still finds it useful to separate that behavior. I tend to err on the side of readability and maintainability, so if it makes pragmatic sense to have more than one assertion, I'll do that.

Redundant assert statements in .NET unit test framework

Isn't it true that every assert statement can be translated to an Assert.IsTrue, since by definition, you are asserting whether something is true or false?
Why is it that test frameworks introduce options like AreEquals, IsNotNull, and especially IsFalse? I feel I spend too much time thinking about which Assert to use when I write unit tests.
You can use Assert.IsTrue all the time if you prefer. The difference is Assert.AreEqual and the like will give you a better error message when the assertion fails.
NUnit (and probably other frameworks) now supports syntax like this:
Assert.That(foo, Is.Equal.To(bar))
Given enough extra code on your part, yes it's true that almost every Assent.XXX can be turned into an Assert.IsTrue call. However there are some which are very difficult to translate like Throws
Assert.Throws<ArgumentNullException>(() => x.TestMethod(null));
Translating that to an Assert.IsTrue is possible but really not worth the effort. Much better to use the Throws method.
Yes,
The AreEqual(obj1, obj2) basically does a Assert.IsTrue(obj1.Equals(obj2)). So, I would say this is true.
But perhaps they introduce those overloads for readability, per example make it obvious they want to compare two objects, or check if a value is equal to false.
It is very simple - to make your test code more readable.
Which is more readable
Assert.IsTrue(quantity > 0)
or
Assert.That(quantity, Is.GreaterThan( 0 ))
Don't you think the second option is more readable?
You can write framework agnostic asserts using a library called Should. It also has a very nice fluent syntax which can be used if you like fluent interfaces. I had a blog post related to the same.
http://nileshgule.blogspot.com/2010/11/use-should-assertion-library-to-write.html

What's wrong with output parameters?

Both in SQL and C#, I've never really liked output parameters. I never passed parameters ByRef in VB6, either. Something about counting on side effects to get something done just bothers me.
I know they're a way around not being able to return multiple results from a function, but a rowset in SQL or a complex datatype in C# and VB work just as well, and seem more self-documenting to me.
Is there something wrong with my thinking, or are there resources from authoritative sources that back me up? What's your personal take on this and why? What can I say to colleagues that want to design with output parameters that might convince them to use different structures?
EDIT: interesting turn- the output parameter I was asking this question about was used in place of a return value. When the return value is "ERROR", the caller is supposed to handle it as an exception. I was doing that but not pleased with the idea. A coworker wasn't informed of the need to handle this condition and as a result, a great deal of money was lost as the procedure failed silently!
Output parameters can be a code smell indicating that your method is doing too much. If you need to return more than one value, the method is likely doing more than one thing. If the data is tightly related, then it would probably benefit from a class that holds both values.
Of course, this is not ALWAYS the case, but I have found that it is usually the case.
In other words, I think you are right to avoid them.
They have their place. Int32.TryParse method is a good example of an effective use of an out parameter.
bool result = Int32.TryParse(value, out number);
if (result)
{
Console.WriteLine("Converted '{0}' to {1}.", value, number);
}
Bob Martin wrote about this Clean Code. Output params break the fundamental idea of a function.
output = someMethod(input)
I think they're useful for getting IDs of newly-inserted rows in the same SQL command, but i don't think i've used them for much else.
I too see very little use of out/ref parameters, although in SQL it sometimes is easier to pass a value back by a parameter than by a resultset (which would then require the use of a DataReader, etc.)
Though, as luck would have it, I just created one such rare function in C# today. It validated a table-like data structure and returned the number of rows and columns in it (which was tricky to calculate because the table could have rowspans/colspans like in HTML). In this case the calculation of both values was done at the same time. Separating it into two functions would have resulted in double the code, memory and CPU time requirements. Creating a custom type just for this one function to return also seems like an overkill to me.
So - there are times when they are the best thing, but mostly you can do just fine without them.
The OUTPUT clause in SQL Server 2005 onwards is a great step forward for getting any field values for rows affected by your DML statements. Ithink that there are a lot of situations where this does away with output parameters.
In VB6, ByRef parameters are good for passing ADO objects around.
other than those two specific cases that come to mind, I tend to avoid using them.
In SQL only...
Stored procedure output parameters are useful.
Say you need one value back. Do you "create #table, insert... exec, select #var = ". Or use an output parameter?
For client calls, an output parameter is far quicker than processing a recordset.
Using RETURN values is limited to signed integer.
Easier to re-use (eg a security check helper procedure)
When using both: recordsets = data, output parameters = status/messages/rowcount etc
Stored procedures recordset output can not be strongly typed like UDFs or client code
You can't always use a UDF (eg logging during security check above)
However, as long as you don't generally use the same parameter for input and output, then until SQL changes completely your options are limited. Saying that, I have one case where I use a paramter for in and out values, but I have a good reason.
My Two Cents:
I agree that output parameters are a concerning practice. VBA is often maintained by people very new to programming and if someone maintaining your code fails to notice that a parameter is ByRef they could introduce some serious logical errors. Also it does tend to break the Property/Function/Sub paradigm.
Another reason that using out parameters is bad practice is that if you really do need to be returning more than one value, chances are that you should have those values in a data structure such as a class or a User Defined Type.
They can however solve some problems. VB5 (and therefore VBA for Office 97) did not allow for a function to return an array. This meant anything returning or altering an array would have to do so via an "out" parameter. In VB6 this ability has been added, but VB6 still forces array parameters to be by reference (to prevent excessive copying in memory). Now you can return a value from a function that alters an array. But it will be just a hair slow (due to the acrobatics going on behind the scenes); it can also confuse newbies into thinking that the array input will not be altered (which will only be true if someone specifically structured it that way). So I find that if I have a function that alters an array it reduces confusion to just use a sub instead of a function (and it will be a tiny bit faster too).
Another possible scenario would be if you are maintaining code and you want to add an out value without breaking the interface you can add an optional out parameter and be confident you won't be breaking any old code. It's not good practice, but if someone wants something fixed right now and you don't have time to do it the "right way" and restructure everything, this can be a handy addition to your tool box.
However if you are developing things from the ground up and you need to return multiple values you should consider:
1. Breaking up the function.
2. Returning a UDT.
3. Returning a Class.
I generally never use them, I think they are confusing and too easy to abuse. We do occasionally use ref parameters but that has more to do with passing in structures vs. getting them back.
Your opinion sounds reasonable to me.
Another drawback of output parameters is the extra code needed to pass results from one function to another. You have to declare the variable(s), call the function to get their values, and then pass the values to another function. You can't just nest function calls. This makes code read very imperatively, rather than declaratively.
C++0x is getting tuples, an anonymous struct-like thing, whose members you access by index. C++ programmers will be able to pack multiple values into one of those and return it. Does C# have something like that? Can it return an array, perhaps, instead? But yeah output parameters are a bit awkward and unclear.

Is it good form to expose derived values as properties?

I need to derive an important value given 7 potential inputs. Uncle Bob urges me to avoid functions with that many parameters, so I've extracted the class. All parameters now being properties, I'm left with a calculation method with no arguments.
“That”, I think, “could be a property, but I'm not sure if that's idiomatic C#.”
Should I expose the final result as a property, or as a method with no arguments? Would the average C# programmer find properties confusing or offensive? What about the Alt.Net crowd?
decimal consumption = calculator.GetConsumption(); // obviously derived
decimal consumption = calculator.Consumption; // not so obvious
If the latter: should I declare interim results as [private] properties, also? Thanks to heavy method extraction, I have several interim results. Many of these shouldn't be part of the public API. Some of them could be interesting, though, and my expressions would look cleaner if I could access them as properties:
decimal interim2 = this.ImportantInterimValue * otherval;
Happy Experiment Dept.:
While debugging my code in VS2008, I noticed that I kept hovering my mouse over the method calls that compute interim results, expecting a hover-over with their return value. After turning all methods into properties, I found that exposing interim results as properties greatly assisted debugging. I'm well pleased with that, but have lingering concerns about readability.
The interim value declarations look messier. The expressions, however, are easier to read without the brackets. I no longer feel compelled to start the method name with a verb. To contrast:
// Clean method declaration; compulsive verby name; callers need
// parenthesis despite lack of any arguments.
decimal DetermineImportantInterimValue() {
return this.DetermineOtherInterimValue() * this.SomeProperty;
}
// Messier property declaration; clean name; clean access syntax
decimal ImportantInterimValue {
get {
return this.OtherInterimValue * this.SomeProperty;
}
}
I should perhaps explain that I've been coding in Python for a decade. I've been left with a tendency to spend extra time making my code easier to call than to write. I'm not sure the Python community would regard this property-oriented style as acceptably “Pythonic”, however:
def determineImportantInterimValue(self):
"The usual way of doing it."
return self.determineOtherInterimValue() * self.someAttribute
importantInterimValue = property(
lambda self => self.otherInterimValue * self.someAttribute,
doc = "I'm not sure if this is Pythonic...")
The important question here seems to be this:
Which one produces more legible, maintainable code for you in the long run?
In my personal opinion, isolating the individual calculations as properties has a couple of distinct advantages over a single monolothic method call:
You can see the calculations as they're performed in the debugger, regardless of the class method you're in. This is a boon to productivity while you're debugging the class.
If the calculations are discrete, the properties will execute very quickly, which means (in my opinion), they observe the rules for property design. It's absurd to think that a guideline for design should be treated as a straightjacket. Remember: There is no silver bullet.
If the calculations are marked private or internal, they do not add unnecessary complexity to consumers of the class.
If all of the properties are discrete enough, compiler inlining may resolve the performance issues for you.
Finally, if the final method that returns your final calculation is far and away easier to maintain and understand because you can read it, that is an utterly compelling argument in and of itself.
One of the best things you can do is think for yourself and dare to challenge the preconceived One Size Fits All notions of our peers and predecessors. There are exceptions to every rule. This case may very well be one of them.
Postscript:
I do not believe that we should abandon standard property design in the vast majority of cases. But there are cases where deviating from The Standard(TM) is called for, because it makes sense to do so.
Personally, I would prefer if you make your public API as a method instead of property. Properties are supposed to be as 'fast' as possible in C#. More details on this discussion: Properties vs Methods
Internally, GetConsumption can use any number of private properties to arrive at the result, choice is yours.
I usually go by what the method or property will do. If it is something that is going to take a little time, I'll use a method. If it's very quick or has a very small number of operations going on behind the scenes, I'll make it a property.
I use to use methods to denote any action on the object or which changes the state of an object. so, in this case I would name the function as CalculateConsumption() which computes the values from other properties.
You say you are deriving a value from seven inputs, you have implemented seven properties, one for each input, and you have a property getter for the result. Some things you might want to consider are:
What happens if the caller fails to set one or more of the seven "input" properties? Does the result still make sense? Will an exception be thrown (e.g. divide by zero)?
In some cases the API may be less discoverable. If I must call a method that takes seven parameters, I know that I must supply all seven parameters to get the result. And if some of the parameters are optional, different overloads of the method make it clear which ones.
In contrast, it may not be so clear that I have to set seven properties before accessing the "result" property, and could be easy to forget one.
When you have a method with several parameters, you can more easily have richer validation. For example, you could throw an ArgumentException if "parameter A and parameter B are both null".
If you use properties for your inputs, each property will be set independently, so you can't perform the validation when the inputs are being set - only when the result property is being dereferenced, which may be less intuitive.

Categories

Resources