FluentAssertions: ShouldBeEquivalentTo vs Should().Be() vs Should().BeEquivalentTo()? - c#

Can anybody summarize differences and usage scope between them?
I read SO articles,
ShouldBeEquivalientTo(): ShouldBeEquivalentTo() is intended to be used for comparing complex object graphs rather than the primitive types part of the .NET framework.
Should().BeEquivalentTo(): individual items Equals() implementation to verify equivalence and has been around since version 1. The newer ShouldBeEquivalenTo() introduced in FA 2.0 is doing an in-depth structural comparison and also reporting on any differences.
Should().Be(): cannot find.
In my humble understanding, ShouldBeEquivalientTo() and Should().BeEquivalentTo() work similar if Should().BeEquivalentTo() does in-depth comparison.

I agree this is confusing. Should().BeEquivalentTo() should actually be called Should().EqualInAnyOrder() or something like that. As you said, it uses the Equals implementation of the involved objects to see if all of the ones in the expected collection appear in the actual collection, regardless of order. I'll need to fix that for the next major version.

Related

Why is memory storage location linked to class/struct?

C# is, unlike C++, a language that hides technical stuff from the developer. No pointers (except in unsafe code) and garbage collection are examples of this. As I understand, C# wants the developer to focus only on the concepts and not the underlying architecture, memory handling etc..
But then, why does the developer have to decide where an object is to be stored? For class it is always on the heap, for struct it is either on the stack (if local variable) or inline (if member of an object).
Isn't that something the compiler could figure out either based on the class definition (it could estimate needed memory space and decide heuristically based on that) or based on the context a given instance is in (is it a local variable in a function, then stack; is it more global, then heap; is it member of an object, then base decision on its estimated memory space)?
PS: I know class and struct have more differences than that, namely reference equality versus value equality, but this is not point of my question. (And for those aspects, other solutions could be found to unlink these properties from the decision class/struct.)
Your question is not valid (I mean in a logical way) because it depends on a false premise:
The developper cannot really decide where an object is to be stored, because this is an implementation detail.
See this answer discussing struct on heap or stack,
or this question: C# structs/classes stack/heap control?
The first links to Eric Lippert's blog. Here is an extract:
Almost every article I see that describes the difference between value
types and reference types explains in (frequently incorrect) detail
about what “the stack” is and how the major difference between value
types and reference types is that value types go on the stack. I’m
sure you can find dozens of examples by searching the web.
I find this characterization of a value type based on its
implementation details rather than its observable characteristics to
be both confusing and unfortunate. Surely the most relevant fact about
value types is not the implementation detail of how they are
allocated, but rather the by-design semantic meaning of “value type”,
namely that they are always copied “by value” . If the relevant thing
was their allocation details then we’d have called them “heap types”
and “stack types”. But that’s not relevant most of the time. Most of
the time the relevant thing is their copying and identity semantics.
I regret that the documentation does not focus on what is most
relevant; by focusing on a largely irrelevant implementation detail,
we enlarge the importance of that implementation detail and obscure
the importance of what makes a value type semantically useful. I
dearly wish that all those articles explaining what “the stack” is
would instead spend time explaining what exactly “copied by value”
means and how misunderstanding or misusing “copy by value” can cause
bugs.
C# is, unlike C++, a language that hides technical stuff from the developer
I do not think this is a fair characterization of c#. There is plenty of technical details a c# developer has to be aware of, and there are plenty of languages work on a much higher abstraction level. I think it would be more fair to say that C# aims to make it easy to write good, working code. Sometimes called "The pit of success" by Eric Lippert. See also C++ and the pit of despair.
Ideally you would just write code that describes the problem, and let the compiler sort out anything that has to do with performance. But writing compilers is hard, especially when you have a hard time constraint since you are compiling just in time. While language is theoretically unrelated to performance, practice show that higher level languages tend to be more difficult to optimize.
While there are important semantic differences between a struct and a class, the main reason to chose one or the other usually comes down to performance, and the performance is directly related to how they are stored and passed around. You would typically avoid many small objects, or passing around huge structs.
As a comparison, Java is very similar to C#, and did just fine without value types for many years. It seems however like they have or will introduce one to reduce the overhead of creating objects.
why does the developer have to decide where an object is to be stored?
The simple answer seem to be that determining the optimal storage location is difficult for the compiler to do. Letting the developer hint how the type is used helps improve performance for some situations and allow C# to be used in in situations where it would otherwise be unsuitable for. At the cost of making the language more complex and more difficult to learn.

Are languages really dependent on libraries?

I've always wondered how the dependencies are managed from a programming language to its libraries. Take for example C#. When I was beginning to learn about computing, I would assume (wrongly as it turns out) that the language itself is designed independently of the class libraries that would eventually become available for it. That is, the set of language keywords (such as for, class or throw) plus the syntax and semantics are defined first, and libraries that can be used from the language are developed separately. The specific classes in those libraries, I used to think, should not have any impact on the design of the language.
But that doesn't work, or not all the time. Consider throw. The C# compiler makes sure that the expression following throw resolves to an exception type. Exception is a class in a library, and as such it should not be special at all. It would be a class as any other, except that the C# compiler assigns it that special semantics. That is very good, but my conclusion is that the design of the language does depend on the existence and behaviour of specific elements in the class libraries.
Additionally, I wonder how this dependency is managed. If I were to design a new programming language, what techniques would I use to map the semantics of throw to the very particular class that is Exception?
So my questions are two:
Am I correct in thinking that language design is tightly coupled to that of its base class libraries?
How are these dependencies managed from within the compiler and run-time? What techniques are used?
Thank you.
EDIT. Thanks to those who pointed out that my second question is very vague. I agree. What I am trying to learn is what kind of references the compiler stores about the types it needs. For example, does it find the types by some kind of unique id? What happens when a new version of the compiler or the class libraries is released? I am aware that this is still pretty vague, and I don't expect a precise, single-paragraph answer; rather, pointers to literature or blog posts are most welcome.
What I am trying to learn is what kind of references the compiler stores about the types it needs. For example, does it find the types by some kind of unique id?
Obviously the C# compiler maintains an internal database of all the types available to it in both source code and metadata; this is why a compiler is called a "compiler" -- it compiles a collection of data about the sources and libraries.
When the C# compiler needs to, say, check whether an expression that is thrown is derived from or identical to System.Exception it pretends to do a global namespace lookup on System, and then it does a lookup on Exception, finds the class, and then compares the resulting class information to the type that was deduced for the expression.
The compiler team uses this technique because that way it works no matter whether we are compiling your source code and System.Exception is in metadata, or if we are compiling mscorlib itself and System.Exception is in source.
Of course as a performance optimization the compiler actually has a list of "known types" and populates that list early so that it does not have to undergo the expense of doing the lookup every time. As you can imagine, the number of times you'd have to look up the built-in types is extremely large. Once the list is populated then the type information for System.Exception can be just read out of the list without having to do the lookup.
What happens when a new version of the compiler or the class libraries is released?
What happens is: a whole bunch of developers, testers, managers, designers, writers and educators get together and spend a few million man-hours making sure that the compiler and the class libraries all work before they're released.
This question is, again, impossibly vague. What has to happen to make a new compiler release? A lot of work, that's what has to happen.
I am aware that this is still pretty vague, and I don't expect a precise, single-paragraph answer; rather, pointers to literature or blog posts are most welcome.
I write a blog about, among other things, the design of the C# language and its compiler. It's at http://ericlippert.com.
I would assume (perhaps wrongly) that the language itself is designed independently of the class libraries that would eventually become available for it.
Your assumption is, in the case of C#, completely wrong. C# 1.0, the CLR 1.0 and the .NET Framework 1.0 were all designed together. As the language, runtime and framework evolved, the designers of each worked very closely together to ensure that the right resources were allocated so that each could ship new features on time.
I do not understand where your completely false assumption comes from; that sounds like a highly inefficient way to write a high-level language and a great way to miss your deadlines.
I can see writing a language like C, which is basically a more pleasant syntax for assembler, without a library. But how would you possibly write, say, async-await without having the guy designing Task<T> in the room with you? It seems like an exercise in frustration.
Am I correct in thinking that language design is tightly coupled to that of its base class libraries?
In the case of C#, yes, absolutely. There are dozens of types that the C# language assumes are available and as-documented in order to work correctly.
I once spent a very frustrating hour with a developer who was having some completely crazy problem with a foreach loop before I discovered that he had written his own IEnumerable<T> that had slightly different methods than the real IEnumerable<T>. The solution to his problem: don't do that.
How are these dependencies managed from within the compiler and run-time?
I don't know how to even begin to answer this impossibly vague question.
All (practical) programming languages have a minimum number of required functions. For modern "OO" languages, this also includes a minimum number of required types.
If the type is required in the Language Specification, then it is required - regardless of how it is packaged.
Conversely, not all of the BCL is required to have a valid C# implementation. This is because not all of the BCL types are required by the Language Specification. For instance, System.Exception (see #16.2) and NullReferenceException are required, but FileNotFoundException is not required to implement the C# Language.
Note that even though the specification provides minimal definitions for base types (e.g System.String), it does not define the commonly-accepted methods (e.g. String.Replace). That is, almost all of the BCL is outside the scope of the Language Specification1.
.. but my conclusion is that the design of the language does depend on the existence and behaviour of specific elements in the class libraries.
I agree entirely and have included examples (and limits of such definitions) above.
.. If I were to design a new programming language, what techniques would I use to map the semantics of "throw" to the very particular class that is "Exception"?
I would not look primarily at the C# specification, but rather I would look at the Common Language Infrastructure specification. This new language should, for practically reasons, be designed to interoperate with existing CLI/CLR languages, but does not necessarily need to "be C#".
1 The CLI (and associated references) do define the requirements of a minimal BCL. So if it is taken that a valid C# implementation must conform to (or may assume) the CLI then there are many other types to consider that are not mentioned in the C# specification itself.
Unfortunately, I do not have sufficient knowledge of the 2nd (and more interesting) question.
my impression is that
in languages like C# and Ada
application source code is portable
standard library source code is not portable
accross compilers/implementations

is "Double-Checked Locking is Broken" a java-only thing?

the page at http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html says that double-checked locking is flawed in java. I'm just wondering does it also apply to other languages (C#, Vb, C++, etc)
I've read Double checked locking pattern: Broken or not?, Is this broken double checked locking?, How to solve the "Double-Checked Locking is Broken" Declaration in Java? to be truthful i don't know what the common consensus is. some say yes its broken others say no.
Anyway, my question is does it also apply to other languages (C#, Vb, C++, etc)
Double checked locking is safe in Java, PROVIDED THAT:
the instance variable is declared as volatile, AND
the JVM correctly implements the JSR-133 specification; i.e. it is compliant with Java 5 and later.
My source is the JSR-133 (Java Memory Model) FAQ - Jeremy Manson and Brian Goetz, February 2004. This is confirmed by Goetz in a number of other places.
However, as Goetz says, this is an idiom whose time has passed. Uncontended synchronization in Java is now fast, so he recommends that you just declare the getInstance() method as synchronized if you need to do lazy initialization. (And I imagine that this applies to other languages too ...)
Besides, all things being equal, it is a bad idea to write code that works in Java 5 but is unreliable in older JVMs.
OK, so what about the other languages? Well, it depends on how the idiom is implemented, and often on the platform.
C# - according to https://stackoverflow.com/a/1964832/139985, it is platform dependent whether the instance variable needs to be volatile. However, Wikipedia says that if you do use volatile or explicit memory barriers, the idiom can be implemented safely.
VB - according to Wikipedia the idiom can be implemented safely using explicit memory barriers.
C++ - according to Wikipedia the idiom can be implemented safely using volatile in Visual C++ 2005. But other sources say that in general the C++ language specification doesn't provide sufficient guarantees for volatile to be sure. However double-checked locking can be implemented in the context of the C++ 2011 language revision - https://stackoverflow.com/a/6099828/139985.
(Note: I'm just summarizing some sources I found which seem to me to be recent ... and sound. I'm not C++, C# or VB expert. Please read the linked pages and make your own judgements.)
This wikipedia article covers java, c++ and .net (c#/vb) http://en.wikipedia.org/wiki/Double-checked_locking
This is a tricky question, with a mine-field of contradictory information out there.
A part of the problem is that there are a few variants of double-checked locking:
The field checked on the fast path may be volatile or not.
There is a one-field variant and a two-field variant of double-checked locking.
And not only that, different authors have a different definition for what it means that the pattern is "correct".
Definition #1: A widely accepted specification of the programming language (e.g. ECMA for C#) guarantees that the pattern is correct.
Definition #2: The pattern works in practice on a particular architecture (typically x86).
As disagreeable as it might seem, a lot of code out there depends on Definition #2.
Let's take C# as an example. In C#, the double-checked pattern (as typically implemented) is correct according to Definition #1 if and only if the field is volatile. But if we consider Definition #2, pretty much all variants are correct on X86 (i.e., happen to work), even if the field is non-volatile. On Itanium, the one-field variant happens to work if the field is non-volatile, but not the two-field variant.
The unfortunate consequence is that you'll find articles making clearly contradictory statements on the correctness of this pattern.
As others have said, this idiom has had its time. FWIW, for lazy initialization, .Net now provides a built-in class: System.Lazy<T> (msdn). Don't know if something similar is available in java though.
It was flawed in Java, it was fixed in Java 5. The fact that is was broken was more of an implementation issue coupled with a misunderstanding than a technically "bad idea".

Why are public fields faster than properties?

I was poking around in XNA and saw that the Vector3 class in it was using public fields instead of properties. I tried a quick benchmark and found that, for a struct the difference is quite dramatic (adding two Vectors together a 100 million times took 2.0s with properties and 1.4s with fields). For a reference type, the difference doesn't seem to be that large but it is there.
So why is that? I know that a property is compiled into get_X and set_X methods, which would incur a method call overhead. However, don't these simple getters/setters always get in-lined by the JIT? I know you can't guarantee what the JIT decides to do, but surely this is fairly high on the list of probability? What else is there that separates a public field from a property at the machine level?
And one thing I've been wondering: how is an auto-implemented property (public int Foo { get; set; }) 'better' OO-design than a public field? Or better said: how are those two different? I know that making it a property is easier with reflection, but anything else? I bet the answer to both questions is the same thing.
BTW: I am using .NET 3.5 SP1 which I believe fixed issues where methods with structs (or methods of structs, I'm not sure) weren't in-lined, so that isn't it. I think I am using it at least, it's certainly installed but then again, I'm using Vista 64-bit with SP1 which should have DX10.1 except that I don't have DX10.1 ..
Also: yeah, I've been running a release build :)
EDIT: I appreciate the quick answers guys, but I indicated that I do know that a property access is a method call, but that I don't know why the, presumably, in-lined method is slower than a direct field access.
EDIT 2: So I created another struct that used explicit GetX() methods (o how I don't miss my Java days at all) and that performed the same whether I disabled in-lining on it (through [MethodImplAttribute(MethodImplOptions.NoInlining)]) or not, so conclusion: non-static methods are apparently never inlined, not even on structs.
I thought that there were exceptions, where the JIT could optmize the virtual method call away. Why can't this happen on structs which know no inheritance and thus a method call can only point to one possible method, right? Or is that because you can implement an interface on it?
This is kind of a shame, since it will really make me think about using properties on performance critical stuff, yet using fields makes me feel dirty and I might as well write what I'm doing in C.
EDIT 3: I found this posting about the exact same subject. His end conclusion is that the property call did get optimized away. I also could've sworn that I've read plenty of times that simple getter/setter properties will get in-lined, despite being callvirt in the IL. So am I going insane?
EDIT 4: Reed Copsey posted the answer in a comment below:
Re: Edit3 - see my updated comment: I believe this is x86 JIT vs x64 JIT issues. the JIT in x64 is not as mature. I'd expect MS to improve this quickly as more 64 bit systems are coming online every day. – Reed Copsey
And my response to his answer:
Thanks, this is the answer! I tried forcing a x86 build and all methods are equally fast, and much faster than the x64. This is very shocking to me actually, I had no idea I was living in the stone age on my 64-bit OS.. I'll include your comment in my answer so it stands out better. – JulianR
Thanks everyone!
Edit 2:
I had another potential thought here:
You mentioned that you are running on x64. I've tested this same issue on x86, and seen the same performance when using auto-properties vs. fields. However, if you look around on Connect and mailing list/forum posts, there are many references online to the fact that the x64 CLR's JIT is a different code base, and has very different performance characteristics to the x86 JIT. My guess is this is one place where x64 is still lagging behind.
Also, FYI, the struct/method/etc thing fixed in .net 3.5sp1 was on the x86 side, and was the fact that method calls that took structs as a parameter would never be inlined on x86 prior to .net3.5sp1. That's pretty much irrelevant to this discussion on your system.
Edit 3:
Another thing: As to why XNA is using fields. I actually was at the Game Fest where they announced XNA. Rico Mariani gave a talk where he brought up many of the same points that are on his blog. It seems the XNA folks had similar ideas when they developed some of the core objects. See:
http://blogs.msdn.com/ricom/archive/2006/09/07/745085.aspx
Particularly, check out point #2.
As for why automatic properties are better than public fields:
They allow you to change the implementation in v2 of your class, and add logic into the property get/set routines as needed, without changing your interface to your end users. This can have a profound effect on your ability to maintain your library and code over time.
---- From original post - but discovered this wasn't the issue--------
Were you running a release build outside of VS? That can be one explanation for why things aren't being optimized. Often, if you are running in VS, even an optimized release build, the VS host process disables many functions of the JIT. This can cause performance benchmarks to change.
You should read this article by Vance. It goes into detail about why methods are not always inlined by the JIT'er even if it looks completely obvious that they should be.
http://blogs.msdn.com/vancem/archive/2008/08/19/to-inline-or-not-to-inline-that-is-the-question.aspx
Public fields are direct assignments
Properties are methods, then more code, insignificant but more.
XNA has to target the XBox 360, and the JIT in the .NET Compact Framework isn't as sophisticated as its desktop counterpart. The .NET CF JIT'er won't inline property methods.
Accessing a field is just a memory reference whereas using a property is actually invoking a method and includes the function call overhead. The reason to use properties rather than fields is to insulate your code from changes and provide better granularity over access. By not exposing your field directly you have greater control over how the access is done. Using automatic fields allows you to get the typically getter/setter behavior but builds in the ability to change this without a subsequent need for the changes to propagate to other parts of the code.
For example, say that you want to change your code so that access to a field is controlled by the current user's role. If you had exposed the field publicly, you'd have to touch every part of the code that accessed it. Exposing it through a property allows you to modify the property code to add your new requirement but doesn't result in unnecessary changes to any code that accesses it.

What is the minimum knowledge of CLR a .NET programmer must have to be a good programmer?

When we talk about the .NET world the CLR is what everything we do depends on.
What is the minimum knowledge of CLR a .NET programmer must have to be a good programmer?
Can you give me one/many you think is/are the most important subjects:
GC?, AppDomain?, Threads?, Processes?, Assemblies/Fusion?
I will very much appreciate if you post a links to articles, blogs, books or other on the topic where more information could be found.
Update: I noticed from some of comments that my question was not clear to some. When I say CLR I don't mean .Net Framework. It is NOT about memorizing .NET libraries, it is rather to understand how does the execution environment (in which those libraries live on runtime) work.
My question was directly inspired by John Robbins the author of "Debugging Applications for Microsoft® .NET" book (which I recommend) and colleague of here cited Jeffrey Richter at Wintellect. In one of introductory chapters he is saying that "...any .NET programmer should know what is probing and how assemblies are loaded into runtime". Do you think there are other such things?
Last Update: After having read first 5 chapters of "CLR via C#" I must say to anyone reading this. If you haven't allready, read this book!
Most of those are way deeper than the kind of thing many developers fall down on in my experience. Most misunderstood (and important) aspects in my experience:
Value types vs reference types
Variables vs objects
Pass by ref vs pass by value
Delegates and events
Distinguishing between language, runtime and framework
Boxing
Garbage collection
On the "variables vs objects" front, here are three statements about the code
string x = "hello";
(Very bad) x is a string with 5 letters
(Slightly better) x is a reference to a string with 5 letters
(Correct) The value of x is a reference to a string with 5 letters
Obviously the first two are okay in "casual" conversation, but only if everyone involved understands the real situation.
A great programmer cannot be measured by the quantity of things he knows about the CLR. Sure it's a nice beginning, but he must also know OOP/D/A and a lot of other things like Design Patterns, Best Practices, O/RM concepts etc.
Fact is I'd say a "great .Net programmer" doesn't necessary need to know much about the CLR at all as long as he has great knowledge about general programming theory and concepts...
I would rather hire a "great Java developer" with great general knowledge and experience in Java for a .Net job then a "master" in .Net that have little experience and thinks O/RM is a stock ticker and stored procedures are a great way to "abstract away the database"...
I've seen professional teachers in .Net completely fail in doing really simple things without breaking their backs due to lack of "general knowledge" while they at the same time "know everything" there is to know about .Net and the CLR...
Updated: reading the relevant parts of the book CLR via C# by Jeffrey Richter..this book can be a good reference..
Should know about Memory Management, Delegates
Jon's answer seems to be pretty complete to me (plus delegates) but I think what fundamentally seperates a good programmer from an average one is answering the why questions rather than the how. It's great to know how garbage collections works and how value types and reference types work, but it's a whole other level to understand when to use a value type vs. reference type. It's the difference between speaking in a language vs. giving a speech in a language (it's all about how we apply the knowledge we have and how we arrive at those decisions).
Jon's answer is good. Those are all fairly basic but important areas that a lot of developers do not have a good understanding of. I think knowing the difference between value and reference types ties in to a basic understanding of how the GC in .NET behaves, but, more importantly, a good understanding of the Dispose pattern is important.
The rest of the areas you mention are either very deep knowledge about the CLR itself or more advanced concepts that aren't widely used (yet). [.NET 4.0 will start to change some of that with the introduction of the parallel extensions and MEF.]
One thing that can be really tricky to grasp is deferred execution and the likes.
How do you explain how a method that returns an IEnumerable works? What does a delegate really do? things like that.

Categories

Resources