Related
According to http://referencesource.microsoft.com/#mscorlib/system/runtime/interopservices/safebuffer.cs
SafeBuffer uses the aligned size of the struct type rather than the actual size of the struct type. It appears this causes alignment issues when writing what needs to be a densely packed array of structures and when reading from a preexisting densely packed non-aligned array of structures in the buffer. In the first case, the use of the aligned rather than the actual size results in unwanted padding bytes. In the second, the data gets mangled. I have two questions (4 really, but 3 are related):
Is there a way around this other than manually aligning access using sequential calls to SafeBuffer.Write<T> / Read<T> (which is slower), or ditching the SafeBuffer class (and therefore the quite nice UnmanagedMemoryAccessor class) entirely?
What are the reasons behind this choice? Why is the CLR enforcing it's own alignment requirements on unmanaged memory? Why should this not be considered a bug?
Hmya, answers to these questions are invariably subjective, we don't have the .NET Framework designers contributing here to pass their design meeting notes to us. But you can safely assume that this is not a bug and this was pained about a great deal. Surely at least one of the reasons that it took so long for MMFs to be supported in .NET.
Everybody loves to ignore or wish away structure packing and alignment details. The CLR does a very terrific job of hiding them. But the buck stops here, no way to ignore them anymore. The cold hard fact is that it is entirely impossible to make everybody happy. The framework has no reasonable way to guess what the code on the other side of the MMF looks like. It is unknowable, MMFs are entirely too simplistic to support anything like metadata. With one clear failure mode of having a 32-bit process on one end and a 64-bit process on the other. They use different alignment choices, 4 vs 8. Many more, particularly if its is native code on the other end using its own #pragma pack.
Given that the framework can never get it 100% right, they chose to at least make it right and efficient when .NET code runs on either side. An entirely reasonable choice.
The only real flaw is that the documentation is lacking. You will have a headache when you need to interop with native code. Trial and error is, right now, the only good way. Or asking a question about the specific problem you have at SO of course :)
I've always wondered how the dependencies are managed from a programming language to its libraries. Take for example C#. When I was beginning to learn about computing, I would assume (wrongly as it turns out) that the language itself is designed independently of the class libraries that would eventually become available for it. That is, the set of language keywords (such as for, class or throw) plus the syntax and semantics are defined first, and libraries that can be used from the language are developed separately. The specific classes in those libraries, I used to think, should not have any impact on the design of the language.
But that doesn't work, or not all the time. Consider throw. The C# compiler makes sure that the expression following throw resolves to an exception type. Exception is a class in a library, and as such it should not be special at all. It would be a class as any other, except that the C# compiler assigns it that special semantics. That is very good, but my conclusion is that the design of the language does depend on the existence and behaviour of specific elements in the class libraries.
Additionally, I wonder how this dependency is managed. If I were to design a new programming language, what techniques would I use to map the semantics of throw to the very particular class that is Exception?
So my questions are two:
Am I correct in thinking that language design is tightly coupled to that of its base class libraries?
How are these dependencies managed from within the compiler and run-time? What techniques are used?
Thank you.
EDIT. Thanks to those who pointed out that my second question is very vague. I agree. What I am trying to learn is what kind of references the compiler stores about the types it needs. For example, does it find the types by some kind of unique id? What happens when a new version of the compiler or the class libraries is released? I am aware that this is still pretty vague, and I don't expect a precise, single-paragraph answer; rather, pointers to literature or blog posts are most welcome.
What I am trying to learn is what kind of references the compiler stores about the types it needs. For example, does it find the types by some kind of unique id?
Obviously the C# compiler maintains an internal database of all the types available to it in both source code and metadata; this is why a compiler is called a "compiler" -- it compiles a collection of data about the sources and libraries.
When the C# compiler needs to, say, check whether an expression that is thrown is derived from or identical to System.Exception it pretends to do a global namespace lookup on System, and then it does a lookup on Exception, finds the class, and then compares the resulting class information to the type that was deduced for the expression.
The compiler team uses this technique because that way it works no matter whether we are compiling your source code and System.Exception is in metadata, or if we are compiling mscorlib itself and System.Exception is in source.
Of course as a performance optimization the compiler actually has a list of "known types" and populates that list early so that it does not have to undergo the expense of doing the lookup every time. As you can imagine, the number of times you'd have to look up the built-in types is extremely large. Once the list is populated then the type information for System.Exception can be just read out of the list without having to do the lookup.
What happens when a new version of the compiler or the class libraries is released?
What happens is: a whole bunch of developers, testers, managers, designers, writers and educators get together and spend a few million man-hours making sure that the compiler and the class libraries all work before they're released.
This question is, again, impossibly vague. What has to happen to make a new compiler release? A lot of work, that's what has to happen.
I am aware that this is still pretty vague, and I don't expect a precise, single-paragraph answer; rather, pointers to literature or blog posts are most welcome.
I write a blog about, among other things, the design of the C# language and its compiler. It's at http://ericlippert.com.
I would assume (perhaps wrongly) that the language itself is designed independently of the class libraries that would eventually become available for it.
Your assumption is, in the case of C#, completely wrong. C# 1.0, the CLR 1.0 and the .NET Framework 1.0 were all designed together. As the language, runtime and framework evolved, the designers of each worked very closely together to ensure that the right resources were allocated so that each could ship new features on time.
I do not understand where your completely false assumption comes from; that sounds like a highly inefficient way to write a high-level language and a great way to miss your deadlines.
I can see writing a language like C, which is basically a more pleasant syntax for assembler, without a library. But how would you possibly write, say, async-await without having the guy designing Task<T> in the room with you? It seems like an exercise in frustration.
Am I correct in thinking that language design is tightly coupled to that of its base class libraries?
In the case of C#, yes, absolutely. There are dozens of types that the C# language assumes are available and as-documented in order to work correctly.
I once spent a very frustrating hour with a developer who was having some completely crazy problem with a foreach loop before I discovered that he had written his own IEnumerable<T> that had slightly different methods than the real IEnumerable<T>. The solution to his problem: don't do that.
How are these dependencies managed from within the compiler and run-time?
I don't know how to even begin to answer this impossibly vague question.
All (practical) programming languages have a minimum number of required functions. For modern "OO" languages, this also includes a minimum number of required types.
If the type is required in the Language Specification, then it is required - regardless of how it is packaged.
Conversely, not all of the BCL is required to have a valid C# implementation. This is because not all of the BCL types are required by the Language Specification. For instance, System.Exception (see #16.2) and NullReferenceException are required, but FileNotFoundException is not required to implement the C# Language.
Note that even though the specification provides minimal definitions for base types (e.g System.String), it does not define the commonly-accepted methods (e.g. String.Replace). That is, almost all of the BCL is outside the scope of the Language Specification1.
.. but my conclusion is that the design of the language does depend on the existence and behaviour of specific elements in the class libraries.
I agree entirely and have included examples (and limits of such definitions) above.
.. If I were to design a new programming language, what techniques would I use to map the semantics of "throw" to the very particular class that is "Exception"?
I would not look primarily at the C# specification, but rather I would look at the Common Language Infrastructure specification. This new language should, for practically reasons, be designed to interoperate with existing CLI/CLR languages, but does not necessarily need to "be C#".
1 The CLI (and associated references) do define the requirements of a minimal BCL. So if it is taken that a valid C# implementation must conform to (or may assume) the CLI then there are many other types to consider that are not mentioned in the C# specification itself.
Unfortunately, I do not have sufficient knowledge of the 2nd (and more interesting) question.
my impression is that
in languages like C# and Ada
application source code is portable
standard library source code is not portable
accross compilers/implementations
Quick little question...
I know that sometimes in other languages libraries have part of their code written in platform-specific straight C for performance reasons. In such cases you can get huge performance gains by using library code wherever possible.
So does the .NET platform do this? Is Microsoft's implementation of the Base Class Library optimized in some way that I can't hope to match in managed code?
What about something little like using KeyValuePair as a type-safe tuple struct instead of writing my own?
As far as I know, the .NET Framework hasn't been compiled in a way that creates hooks into some otherwise-inaccessible hardware acceleration or something like that, so for simple things like KeyValuePair and Tuple, you're probably safe rolling your own.
However, there are a number of other advantages to using standard framework classes, and I'd hesitate to write my own without a strong reason.
They're already written, so why give yourself extra work?
Microsoft has put their code through a pretty rigorous vetting process, so there's a good chance that their code will be more correct and more efficient than yours will.
Other developers that have to look at your code will know exactly what to expect when they see standard framework classes being used, whereas your home-brewed stuff might make them scratch their heads for a while.
Update
#gordy also makes a good point, that the standard framework classes are being used by everybody and their dog, so there will be a slight performance gain simply due to the fact that:
the class likely won't have to be statically instantiated or just-in-time compiled just for your code,
the class's instructions are more likely to already be loaded in a cache, since they'll likely have been used recently by other parts of code. By using them, you're less likely to need to load the code into a cache in the first place, and you're less likely to be kicking other code out of the cache that's likely to be used again soon.
I've wondered this myself but I suspect that it's not the case since you can "decompile" all of base libraries in Reflector.
There's probably still a performance advantage over homemade stuff in that the code is likely jitted already and cached.
I suggest you use built-in classes most of the time, UNLESS YOU'VE MEASURED IT'S NOT FAST ENOUGH.
I'm pretty sure MS put a lot of time and effort building something fast and reliable. It is totally possible you can beat them... after a few weeks of efforts. I just don't think it is worth the time most of the time.
The only time it seems ok to rewrite something is when it does not do all that you want. Just be aware of the time cost and the associated difficulty.
Could you ever hope to match the performance? Possibly, though keep in mind their code has been fully tested and extremely optimized, so I'd say it's not a worth-while effort unless you have a very specific need that there isn't a BCL type that directly fulfills.
And .NET 4.0 already has a good Tuple<> implementation. Though in previous versions of .NET you'd have to roll your own if you need anything bigger than a KeyValuePair.
The real performance gain comes from the fact that the MS team built and tested the library methods. You can rest assured with a very high degree of comfort that the objects will behave without introducing bugs.
Then there is the matter of re-inventing the wheel. You'd really have to have a great reason for doing so.
Main performance reasons always lay in architecture or complex algorithms, language is no matter.
Miscrosoft Base Class Library always comes with a complexity explanation for "heavy" methods. So you can easily decide use it, or find another "faster" algorithm to implement or use.
Of corse when it comes to heavy algorithms (graphics, archiving, etc.) then performance gains from going to lower level language come in handy.
Is there any design reason for that (like the reason they gave up multi inheritance)?
or it just wasn't important enough?
And same question applies for optional parameters in methods... this was already in the first version of vb.net... so it surely no laziness that cause MS not to allow optional parameters, probably architecture decision.. and it seems they had change of heart about that, because C# 4 is going to include that..
What was the decision and why did they give it up?
Edit:
Maybe readers didn't fully understand me. I'm working lately on a calculation program (support numbers of any size, to the last digit), in which some methods are used millions of times per second.
Say I have a method called Add(int num), and this method is used quiet a lot with 1 as parameter (Add(1);), I've found out it is faster to implement a special method especially for one. And I don't mean overloading - Writing a new method called AddOne, and literally copy the Add method into it, except that instead of using num I'm writing 1. This might seems horribly weird to you, but it's actually faster.
(as much as ugly it is)
That made me wonder why C# doesn't support manual inline which can be amazingly helpful here.
Edit 2:
I asked myself whether or not to add this. I'm very well familiar with the weirdness (and disadvantages) of choosing a platform such as dot net for such project, but I think dot net optimizations are more important than you think... especially features such as Any CPU etc.
To answer part of your question, see Eric Gunnerson's blog post: Why doesn't C# have an 'inline' keyword?
A quote from his post:
For C#, inlining happens at the JIT
level, and the JIT generally makes a
decent decision.
EDIT: I'm not sure of the reason for delayed optional parameters support, however saying they "gave up" on it sounds as though they were expected to implement it based on our expectations of what other languages offered. I imagine it wasn't high on their priority list and they had deadlines to get certain features out the door for each version. It probably didn't rise in importance till now, especially since method overloading was an available alternative. Meanwhile we got generics (2.0), and the features that make LINQ possible etc. (3.0). I'm happy with the progression of the language; the aforementioned features are more important to me than getting support for optional parameters early on.
Manual inlining would be almost useless. The JIT compiler inlines methods during native code compilation where appropriate, and I think in almost all cases the JIT compiler is better at guessing when it is appropriate than the programmer.
As for optional parameters, I don't know why they weren't there in previous versions. That said, I don't like them to be there in C# 4, because I consider them somewhat harmful because the parameter get baked into the consuming assembly and you have to recompile it if you change the standard values in a DLL and want the consuming assembly to use the new ones.
EDIT:
Some additional information about inlining. Although you cannot force the JIT compiler to inline a method call, you can force it to NOT inline a method call. For this, you use the System.Runtime.CompilerServices.MethodImplAttribute, like so:
internal static class MyClass
{
[System.Runtime.CompilerServices.MethodImplAttribute(MethodImplOptions.NoInlining)]
private static void MyMethod()
{
//Powerful, magical code
}
//Other code
}
My educated guess: the reason earlier versions of C# didn't have optional parameters is because of bad experiences with them in C++. On the surface, they look straight-forward enough, but there are a few bothersome corner cases. I think one of Herb Sutter's books describes this in more detail; in general, it has to do with overriding virtual methods. Maximilian has mentioned one of the .NET corner cases in his answer.
You can also pretty much get by with out them by manually writing multiple overloads; that may not be very nice for the author of the class, but clients will hardly notice the difference between overloads and optional parameters.
So after all these years w/o them, why did C# 4.0 add them? 1) improved parity with VB.NET, and 2) easier interop with COM.
I'm working lately on a calculation program (support numbers of any size, to the last digit), in which some methods are used literally millions of times per second.
Then you chose a wrong language. I assume you actually profiled your code (right?) and know that there is nothing apart from micro-optimisations that can help you. Also, you're using a high-performance native bigint library and not writing your own, right?
If that's true, don't use .NET. If you think you can gain speed on partial specialisation, go to Haskell, C, Fortran or any other language that either does it automatically, or can expose inlining to you to do it by hand.
If Add(1) really matters to you, heap allocations will matter too.
However, you should really look at what the profiler can tell you...
C# has added them in 4.0: http://msdn.microsoft.com/en-us/library/dd264739(VS.100).aspx
As to why they weren't done from the beginning, its most likely because they felt method overloads gave more flexibility. With overloading you can specify multiple 'defaults' based on the other parameters that you're taking. Its also not that much more syntax.
Even in languages like C++, inlining something doesn't guarantee that it'll happen; it's a hint to the compiler. The compiler can either take the hint, or do its own thing.
C# is another step removed from the generated assembly code (via IL + the JIT), so it becomes even harder to guarantee that something will inline. Furthermore, you have issues like the x86 + x64 implementations of the JIT differing in behaviour.
Java doesn't include an inline keyword either. The better Java JITs can inline even virtual methods, nor does the use of keywords like private or final make any difference (it used to, but that is now ancient history).
I was poking around in XNA and saw that the Vector3 class in it was using public fields instead of properties. I tried a quick benchmark and found that, for a struct the difference is quite dramatic (adding two Vectors together a 100 million times took 2.0s with properties and 1.4s with fields). For a reference type, the difference doesn't seem to be that large but it is there.
So why is that? I know that a property is compiled into get_X and set_X methods, which would incur a method call overhead. However, don't these simple getters/setters always get in-lined by the JIT? I know you can't guarantee what the JIT decides to do, but surely this is fairly high on the list of probability? What else is there that separates a public field from a property at the machine level?
And one thing I've been wondering: how is an auto-implemented property (public int Foo { get; set; }) 'better' OO-design than a public field? Or better said: how are those two different? I know that making it a property is easier with reflection, but anything else? I bet the answer to both questions is the same thing.
BTW: I am using .NET 3.5 SP1 which I believe fixed issues where methods with structs (or methods of structs, I'm not sure) weren't in-lined, so that isn't it. I think I am using it at least, it's certainly installed but then again, I'm using Vista 64-bit with SP1 which should have DX10.1 except that I don't have DX10.1 ..
Also: yeah, I've been running a release build :)
EDIT: I appreciate the quick answers guys, but I indicated that I do know that a property access is a method call, but that I don't know why the, presumably, in-lined method is slower than a direct field access.
EDIT 2: So I created another struct that used explicit GetX() methods (o how I don't miss my Java days at all) and that performed the same whether I disabled in-lining on it (through [MethodImplAttribute(MethodImplOptions.NoInlining)]) or not, so conclusion: non-static methods are apparently never inlined, not even on structs.
I thought that there were exceptions, where the JIT could optmize the virtual method call away. Why can't this happen on structs which know no inheritance and thus a method call can only point to one possible method, right? Or is that because you can implement an interface on it?
This is kind of a shame, since it will really make me think about using properties on performance critical stuff, yet using fields makes me feel dirty and I might as well write what I'm doing in C.
EDIT 3: I found this posting about the exact same subject. His end conclusion is that the property call did get optimized away. I also could've sworn that I've read plenty of times that simple getter/setter properties will get in-lined, despite being callvirt in the IL. So am I going insane?
EDIT 4: Reed Copsey posted the answer in a comment below:
Re: Edit3 - see my updated comment: I believe this is x86 JIT vs x64 JIT issues. the JIT in x64 is not as mature. I'd expect MS to improve this quickly as more 64 bit systems are coming online every day. – Reed Copsey
And my response to his answer:
Thanks, this is the answer! I tried forcing a x86 build and all methods are equally fast, and much faster than the x64. This is very shocking to me actually, I had no idea I was living in the stone age on my 64-bit OS.. I'll include your comment in my answer so it stands out better. – JulianR
Thanks everyone!
Edit 2:
I had another potential thought here:
You mentioned that you are running on x64. I've tested this same issue on x86, and seen the same performance when using auto-properties vs. fields. However, if you look around on Connect and mailing list/forum posts, there are many references online to the fact that the x64 CLR's JIT is a different code base, and has very different performance characteristics to the x86 JIT. My guess is this is one place where x64 is still lagging behind.
Also, FYI, the struct/method/etc thing fixed in .net 3.5sp1 was on the x86 side, and was the fact that method calls that took structs as a parameter would never be inlined on x86 prior to .net3.5sp1. That's pretty much irrelevant to this discussion on your system.
Edit 3:
Another thing: As to why XNA is using fields. I actually was at the Game Fest where they announced XNA. Rico Mariani gave a talk where he brought up many of the same points that are on his blog. It seems the XNA folks had similar ideas when they developed some of the core objects. See:
http://blogs.msdn.com/ricom/archive/2006/09/07/745085.aspx
Particularly, check out point #2.
As for why automatic properties are better than public fields:
They allow you to change the implementation in v2 of your class, and add logic into the property get/set routines as needed, without changing your interface to your end users. This can have a profound effect on your ability to maintain your library and code over time.
---- From original post - but discovered this wasn't the issue--------
Were you running a release build outside of VS? That can be one explanation for why things aren't being optimized. Often, if you are running in VS, even an optimized release build, the VS host process disables many functions of the JIT. This can cause performance benchmarks to change.
You should read this article by Vance. It goes into detail about why methods are not always inlined by the JIT'er even if it looks completely obvious that they should be.
http://blogs.msdn.com/vancem/archive/2008/08/19/to-inline-or-not-to-inline-that-is-the-question.aspx
Public fields are direct assignments
Properties are methods, then more code, insignificant but more.
XNA has to target the XBox 360, and the JIT in the .NET Compact Framework isn't as sophisticated as its desktop counterpart. The .NET CF JIT'er won't inline property methods.
Accessing a field is just a memory reference whereas using a property is actually invoking a method and includes the function call overhead. The reason to use properties rather than fields is to insulate your code from changes and provide better granularity over access. By not exposing your field directly you have greater control over how the access is done. Using automatic fields allows you to get the typically getter/setter behavior but builds in the ability to change this without a subsequent need for the changes to propagate to other parts of the code.
For example, say that you want to change your code so that access to a field is controlled by the current user's role. If you had exposed the field publicly, you'd have to touch every part of the code that accessed it. Exposing it through a property allows you to modify the property code to add your new requirement but doesn't result in unnecessary changes to any code that accesses it.