I stumbled upon a method today. I'm talking about: Array.Initialize().
According to the documentation:
This method is designed to help compilers support value-type arrays; most users do not need this method.
How does this method is responsible for making the compiler support value types? As far as I'm concerned this method just:
Initializes every element of the value-type Array by calling the default constructor of the value type.
Also, why is it public? I don't see myself with the need of calling this method, compilers already initialize arrays when created, so manually calling this method will be redundant and useless.
Even when my intention would be resetting the values of an array, I would still not call it, I would create a new one. array = new int[].
So, it seems that this method exist just for the sake of the compiler. Why is this? Can anyone give me some more details?
It's worth noting that the rules of .NET are different to the rules of C#.
There are things we can do in .NET that we can't do in C#, generally either because the code is not verifiable (ref return types for example) or because they could introduce some confusion.
In C# structs cannot have a defined parameterless constructor, and calling new SomeValueType() works by creating a zero-filled portion of memory (all fields therefore being 0 for numeric types, null for reference types, and the result of this same rule again for other value-types).
In .NET you can have a parameterless constructor on a value type.
It's probably a bad idea to do so. For one thing the rules about just when it is called and just when the memory of the value is zero-filled, and what happens upon assignment in different cases aren't entirely simple (e.g. new SomeValueType() will call it but new T() in a generic method where T is SomeValueType will not!). Life is simpler if the result of new SomeValueType() will always be zero-filling. That no doubt influenced the design of C# not allowing this even though .NET does.
For this reason, Array.Initialize() will never make sense on new arrays of any type that was written in C#, because calling the constructor and zero-filling is the same thing.
But by the same token, it's possible for a type to be written in another .NET language (at the very least, you can do it in CIL) that does have a parameterless constructor that actually has an effect. And for that reason its possible that a compiler for such a language would want its equivalent to new SomeValueType[3] to call that constructor on all the types in the array. And therefore it's sensible to have a method in the framework that allows such a fill to be done, so that a compiler for such a language can make use of it.
Also, why is it public?
So it can be called by code produced by such a hypothetical constructor even in a context where security restrictions prevent it from calling private methods of another assembly.
For me myself it looks like the Initialize() method runs through the array and recreates the Value Types within. So with a new array you get a new empty array and so you get with Array.Clear(), but with Array.Initialize() you get an Array full of fresh created Value Types (types and length based on the old array).
And that should be all of the difference.
Based on the CLR source, the method traverses each index of the array and initializes the value type on that index by calling the default constructor, similar to initobj IL instruction (I wonder what happens when the constructor throws an exception, though). The method is public because calling a private method directly from IL would make it a bit unverifiable.
Today's C# compilers do not initialize each element of the array when creating it, simply "set" each index to the default value of the type. C# 6 introduces implementing default constructors for value types (which were already supported by CLR), so this is needed for languages with different array creation semantics.
You can see the expected use in the test code:
https://github.com/dotnet/coreclr/blob/3015ff7afb4936a1c5c5856daa4e3482e6b390a9/tests/src/CoreMangLib/cti/system/array/arrayinitialize.cs
Basically, it sets an array of non-intrinsic value-types back to their default(T) state.
It does not seem like an amazingly useful tool, but I can see how it could be useful for zero'ing out arrays of non-intrinsic value data.
Related
I'm trying to understand the design decision behind this part of the language. I admit i'm very new to it all but this is something which caught me out initially and I was wondering if I'm missing an obvious reason. Consider the following code:
List<int> MyList = new List<int>() { 5, 4, 3, 2, 1 };
int[] MyArray = {5,4,3,2,1};
//Sort the list
MyList.Sort();
//This was an instance method
//Sort the Array
Array.Sort(MyArray);
//This was a static method
Why are they not both implemented in the same way - intuitively to me it would make more sense if they were both instance methods?
The question is interesting because it reveals details of the .NET type system. Like value types, string and delegate types, array types get special treatment in .NET. The most notable oddish behavior is that you never explicitly declare an array type. The compiler takes care of it for you with ample helpings of the jitter. System.Array is an abstract type, you'll get dedicated array types in the process of writing code. Either by explicitly creating a type[] or by using generic classes that have an array in their base implementation.
In a largish program, having hundreds of array types is not unusual. Which is okay, but there's overhead involved for each type. It is storage required for just the type, not the objects of it. The biggest chunk of it is the so-called 'method table'. In a nutshell, it is a list of pointers to each instance method of the type. Both the class loader and the jitter work together to fill this table. This is commonly known as the 'v-table' but isn't quite a match, the table contains pointers to methods that are both non-virtual and virtual.
You can see where this leads perhaps, the designers were worried about having lots of types with big method tables. So looked for ways to cut down on the overhead.
Array.Sort() was an obvious target.
The same issue is not relevant for generic types. A big nicety of generics, one of many, one method table can handle the method pointers for any type parameter of a reference type.
You are comparing two different types of 'object containers':
MyList is a generic collection of type List, a wrapper class, of type int, where the List<T> represents a strongly typed list of objects. The List class itself provides methods to search, sort, and manipulate its contained objects.
MyArray is a basic data structure of type Array. The Array does not provide the same rich set of methods as the List. Arrays can at the same time be single-dimensional, multidimensional or jagged, whilst Lists out of the box only are single-dimensional.
Take a look at this question, it provides a richer discussion about these data types: Array versus List<T>: When to use which?
Without asking someone who was involved in the design of the original platform it's hard to know. But, here's my guess.
In older languages, like C, arrays are dumb data structures - they have no code of their own. Instead, they're manipulated by outside methods. As you move into an Object oriented framework, the closest equivilent is a dumb object (with minimal methods) manipulated by static methods.
So, my guess is that the implementation of .NET Arrays is more a symptom of C style thinking in the early days of development than anything else.
This likely has to do with inheritance. The Array class cannot be manually derived from. But oddly, you can declare an array of anything at all and get an instance of System.Array that is strongly typed, even before generics allowed you to have strongly typed collections. Array seems to be one of those magic parts of the framework.
Also notice that none of the instance methods provided on an array massively modify the array. SetValue() seems to be the only one that changes anything. The Array class itself provides many static methods that can change the content of the array, like Reverse() and Sort(). Not sure if that's significant - maybe someone here can give some background as to why that's the case.
In contrast, List<T> (which wasn't around in the 1.0 framework days) and classes like ArrayList (which was around back then) are just run-of-the mill classes with no special meaning within the framework. They provide a common .Sort() instance method so that when you inherited from these classes, you'd get that functionality or could override it.
However, these kinds of sort methods have gone out of vogue anyway as extension methods like Linq's .OrderBy() style sorting have become the next evolution. You can query and sort arrays and Lists and any other enumerable object with the same mechanism now, which is really, really nice.
-- EDIT --
The other, more cynical answer may just be - that's how Java did it so Microsoft did it the same way in the 1.0 version of the framework since at that time they were busy playing catch-up.
One reason might be because Array.Sort was designed in .NET 1.0, which had no generics.
I'm not sure, but I'm thinking maybe just so that arrays are as close to Primitives as they can be.
I have a code like the following:
struct A
{
void SomeMethod()
{
var items = Enumerable.Range(0, 10).Where(i => i == _field);
}
int _field;
}
... and then i get the following compiler error:
Anonymous methods inside structs can not access instance members of 'this'.
Can anybody explains what's going on here.
Variables are captured by reference (even if they were actually value-types; boxing is done then).
However, this in a ValueType (struct) cannot be boxed, and hence you cannot capture it.
Eric Lippert has a nice article on the surprises of capturing ValueTypes. Let me find the link
The Truth About Value Types
Note in response to the comment by Chris Sinclair:
As a quick fix, you can store the struct in a local variable: A thisA = this; var items = Enumerable.Range(0, 10).Where(i => i == thisA._field); – Chris Sinclair 4 mins ago
Beware of the fact that this creates surprising situations: the identity of thisA is not the same as this. More explicitly, if you choose to keep the lambda around longer, it will have the boxed copy thisA captured by reference, and not the actual instance that SomeMethod was called on.
When you have an anonymous method it will be compiled into a new class, that class will have one method (the one you define). It will also have a reference to each variable that you used that was outside of the scope of the anonymous method. It's important to emphasize that it is a reference, not a copy, of that variable. "lambdas close over variables, not values" as the saying goes. This means that if you close over a variable outside of the scope of a lambda, and then change that variable after defining the anonymous method (but before invoking it) then you will see the changed value when you do invoke it).
So, what's the point of all of that. Well, if you were to close over this for a struct, which is a value type, it's possible for the lambda to outlive the struct. The anonymous method will be in a class, not a struct, so it will go on the heap, live as long as it needs to, and you are free to pass a reference to that class (directly or indirectly) wherever you want.
Now imagine that we have a local variable, with a struct of the type you've defined here. We use this named method to generate a lambda, and let's assume for a moment that the query items is returned (instead of the method being void). Would could then store that query in another instance (instead of local) variable, and iterate over that query some time later on another method. What would happen here? In essence, we would have held onto a reference to a value type that was on the stack once it is no longer in scope.
What does that mean? The answer is, we have no idea. (Please look over the link; it's kinda the crux of my argument.) The data could just happen to be the same, it could have been zeroed out, it could have been filled by entirely different objects, there is no way of knowing. C# goes to great lengths, as a language, to prevent you from doing things like this. Languages such as C or C++ don't try so hard to stop you from shooting your own foot.
Now, in this particular case, it's possible that you aren't going to use the lambda outside of the scope of what this refers to, but the compiler doesn't know that, and if it lets you create the lambda it has no way of determining whether or not you expose it in a way that could result in it outliving this, so the only way to prevent this problem is to disallow some cases that aren't actually problematic.
Assume that 2 different methods - one static and one non-static - need an instance variable.
The variable is used 3-5 different times within the methods for comparison purposes.
The variable is NOT changed in any manner.
Also would the type of variable - String, Colection, Collection, etc. make any difference on how it should be coded.
What is the best/right way of using Instance Variable within a private method (static and non-static)?
Pass as method argument
Store locally by using the method to get the value - this.getClaimPropertyVertices();
Store locally by getting the value - this.claimPropertyVertices;
Use the instance variable directly in the method
When creating a local variable to store the value will the "final" keyword provide any advantages, if the variable will not be changed.
Edit 1: Based on a comment, I am adding additional information
The value cannot be created locally in the method. It has to come from the class or some other method accessed by the class.
My Solution Based on the Answers:
Based on the answer by #EricJ. and #Jodrell. I went with option 1 and also created it as a private static method. I also found some details here to support this.
When creating a local variable to store the value will the "final" keyword provide any advantages, if the variable will not be changed
In Java, final provides an optimization opportunity to the compiler. It states that the contents of the variable will not be changed. The keyword readonly provides a similar role in C#.
Whether or not that additional opportunity for optimization is meaningful depends on the specific problem. In many cases, the cost of other portions of the algorithm will be vastly larger than optimizations that the compiler is able to make due to final or readonly.
Use of those keywords has another benefit. They create a contract that the value will not change, which helps future maintainers of the code understand that they should not change the value (indeed, the compiler will not let them).
What is the best/right way of using Instance Variable within a private method (static and non-static)?
Pass as method argument
The value is already stored in the instance. Why pass it? Best case is this is not better than using the instance property/field. Worst case the JITer not inline the call, and will create a larger stack frame costing a few CPU cycles. Note: if you are calling a static method, then you must pass the variable as the static method cannot access the object instance.
Store locally by using the method to get the value - this.getClaimPropertyVertices();
This is what I do in general. Getters/setters are there to provide a meaningful wrapper around fields. In some cases, the getter will initialize the backing field (common pattern in C# when using serializers that do not call the object constructor. Don't get me started on that topic...).
Store locally by getting the value - this.claimPropertyVertices;
No, see above.
Use the instance variable directly in the method
Exactly the same as above. Using this or not using this should generate the exact same code.
UPDATE (based on your edit)
If the value is external to the object instance, and should not meaningfully be stored along with the instance, pass it in as a value to the method call.
If you write your functions with the static keyword whenever you can, there are several obvious benefits.
Its obvious what inputs effect the function from the signature.
You know that the function will have no side effects (unless you are passing by reference). This overlooks non-functional side effects, like changes to the GUI.
The function is not programtically tied to the class, if you decide that logically its behaviour has a better association with another entity, you can just move it. Then adjust any namespace references.
These benefits make the function easy to understand and simpler to reuse. They will also make it simpler to use the function in a Multi Threaded context, you don't have to worry about contention on ever spreading side effects.
I will cavet this answer. You should write potentially resuable functions with the static keyword. Simple or obviously non-resulable functionality should just access the private member or getter, if implemented.
From this Answer, I came to know that KeyValuePair are immutables.
I browsed through the docs, but could not find any information regarding immutable behavior.
I was wondering how to determine if a type is immutable or not?
I don't think there's a standard way to do this, since there is no official concept of immutability in C#. The only way I can think of is looking at certain things, indicating a higher probability:
1) All properties of the type have a private set
2) All fields are const/readonly or private
3) There are no methods with obvious/known side effects
4) Also, being a struct generally is a good indication (if it is BCL type or by someone with guidelines for this)
Something like an ImmutabeAttribute would be nice. There are some thoughts here (somewhere down in the comments), but I haven't seen one in "real life" yet.
The first indication would be that the documentation for the property in the overview says "Gets the key in the key/value pair."
The second more definite indication would be in the description of the property itself:
"This property is read/only."
I don't think you can find "proof" of immutability by just looking at the docs, but there are several strong indicators:
It's a struct (why does this matter?)
It has no settable public properties (both are read-only)
It has no obvious mutator methods
For definitive proof I recommend downloading the BCL's reference source from Microsoft or using an IL decompiler to show you how a type would look like in code.
A KeyValuePair<T1,T2> is a struct which, absent Reflection, can only be mutated outside its constructor by copying the contents of another KeyValuePair<T1,T2> which holds the desired values. Note that the statement:
MyKeyValuePair = new KeyValuePair(1,2);
like all similar constructor invocations on structures, actually works by creating a new temporary instance of KeyValuePair<int,int> (happens before the constructor itself executes), setting the field values of that instance (done by the constructor), copying all public and private fields of that new temporary instance to MyKeyValuePair, and then discarding the temporary instance.
Consider the following code:
static KeyValuePair MyKeyValuePair; // Field in some class
// Thread1
MyKeyValuePair = new KeyValuePair(1,1);
// ***
MyKeyValuePair = new KeyValuePair(2,2);
// Thread2
st = MyKeyValuePair.ToString();
Because MyKeyValuePair is precisely four bytes in length, the second statement in Thread1 will update both fields simultaneously. Despite that, if the second statement in Thread1 executes between Thread2's evaluation of MyKeyValuePair.Key.ToString() and MyKeyValuePair.Value.ToString(), the second ToString() will act upon the new mutated value of the structure, even though the first already-completed ToString()operated upon the value before the mutation.
All non-trivial structs, regardless of how they are declared, have the same immutability rules for their fields: code which can change a struct can change its fields; code which cannot change a struct cannot change its fields. Some structs may force one to go through hoops to change one of their fields, but designing struct types to be "immutable" is neither necessary nor sufficient to ensure the immutability of instances. There are a few reasonable uses of "immutable" struct types, but such use cases if anything require more care than is necessary for structs with exposed public fields.
I read all the questions related to this topic, and they all give reasons why a default constructor on a struct is not available in C#, but I have not yet found anyone who suggests a general course of action when confronted with this situation.
The obvious solution is to simply convert the struct to a class and deal with the consequences.
Are there other options to keep it as a struct?
I ran into this situation with one of our internal commerce API objects. The designer converted it from a class to a struct, and now the default constructor (which was private before) leaves the object in an invalid state.
I thought that if we're going to keep the object as a struct, a mechanism for checking the validity of the state should be introduced (something like an IsValid property). I was met with much resistance, and an explanation of "whoever uses the API should not use the default constructor," a comment which certainly raised my eyebrows. (Note: the object in question is constructed "properly" through static factory methods, and all other constructors are internal.)
Is everyone simply converting their structs to classes in this situation without a second thought?
Edit: I would like to see some suggestions about how to keep this type of object as a struct -- the object in question above is much better suited as a struct than as a class.
For a struct, you design the type so the default constructed instance (fields all zero) is a valid state. You don't [shall not] arbitrarily use struct instead of class without a good reason - there's nothing wrong with using an immutable reference type.
My suggestions:
Make sure the reason for using a struct is valid (a [real] profiler revealed significant performance problems resulting from heavy allocation of a very lightweight object).
Design the type so the default constructed instance is valid.
If the type's design is dictated by native/COM interop constraints, wrap the functionality and don't expose the struct outside the wrapper (private nested type). That way you can easily document and verify proper use of the constrained type requirements.
The reason for this is that a struct (an instance of System.ValueType) is treated specially by the CLR: it is initialized with all the fields being 0 (or default). You don't really even need to create one - just declare it. This is why default constructors are required.
You can get around this in two ways:
Create a property like IsValid to indicate if it is a valid struct, like you indicate and
in .Net 2.0 consider using Nullable<T> to allow an uninitialized (null) struct.
Changing the struct to a class can have some very subtle consequences (in terms of memory usage and object identity which come up more in a multithreaded environment), and non-so-subtle but hard to debug NullReferenceExceptions for uninitialized objects.
The reason why there is no possibility to define a default constructor is illustrated by the following expression:
new MyStruct[1000];
You've got 3 options here
calling the default constructor 1000 times, or
creating corrupt data (note that a struct can contain references; if you don't initialize or blank out the reference, you could potentially access arbitrary memory), or
blank the allocated memory out with zeroes (at the byte level).
.NET does the same for both structs and classes: fields and array elements are blanked out with zeroes. This also gets more consistent behavior between structs and classes and no unsafe code. It also allows the .NET framework not to specialize something like new byte[1000].
And that's the default constructor for structs .NET demands and takes care of itself: zero out all bytes.
Now, to handle this, you've got a couple of options:
Add an Am-I-Initialized property to the struct (like HasValue on Nullable).
Allow the zeroed out struct to be a valid value (like 0 is a valid value for a decimal).