MSDN states that:
By using a collection initializer you do not have to specify multiple calls to the Add method of the class in your source code; the compiler adds the calls.
They also give this example, using the newer collection initialization syntax with brackets:
var numbers = new Dictionary<int, string> {
[7] = "seven",
[9] = "nine",
[13] = "thirteen"
};
However, when checking the IL code generated, it seems that this code does not at all result in any calls to the Add method, but to rather to one set_item, like so:
IL_0007: ldstr "seven"
IL_000c: callvirt instance void class [mscorlib]System.Collections.Generic.Dictionary`2<int32, string>::set_Item(!0/*int32*/, !1/*string*/)
The "old" syntax with curly brackets in contrast gives the following:
// C# code:
var numbers2 = new Dictionary<Int32, String>
{
{7, "seven"},
{9, "nine"},
{13, "thirteen"}
};
// IL code snippet:
// ----------
// IL_0033: ldstr "seven"
// IL_0038: callvirt instance void class [mscorlib]System.Collections.Generic.Dictionary`2<int32, string>::Add(!0/*int32*/, !1/*string*/)
... As you can see, a call to Add is the result, as expected. (One can only assume that the text on MSDN mentioned above is yet to be updated.)
I've so far discovered one case where this difference actually matters, and that is with the quirky System.Collections.Specialized.NameValueCollection. This one allows for one key to point to more than one value. Initialization can be done in both ways:
const String key = "sameKey";
const String value1 = "value1";
const String value2 = "value2";
var collection1 = new NameValueCollection
{
{key, value1},
{key, value2}
};
var collection2 = new NameValueCollection
{
[key] = value1,
[key] = value2
};
... But because of the differences in how only the former actually calls the NameValueCollection::Add(string, string), the results differ when looking at the contents of each collection;
collection1[key] = "value1,value2"
collection2[key] = "value2"
I realize that there's a connection between the old syntax and the IEnumerable interface, and how the compiler finds the Add method by naming convention etcetera. I allso realize the benefits of any indexer type being subject to the new syntax, as discussed in this SO answer before.
Perhaps these are all expected features, from your point of view, but the implications had not occurred to me, and I'm curious to know more.
So, I wonder if there's a source of documentation at MSDN or elsewhere that clarifies this difference in behavior that comes with the choice of syntax. I also wonder if you know of any other examples where this choice may have such an impact as when initializing a NameValueCollection.
I suppose for the ultimate clarification, you have to go to the specification. The C# 6 spec is not 'officially' released, but there is an unofficial draft available.
What's interesting here is that, despite its location in the Programming Guide, the indexer syntax is not a collection initializer, it's an object initializer. From 7.6.11.3 'Collection Initializers':
A collection initializer consists of a sequence of element initializers, enclosed by { and } tokens and separated by
commas. Each element initializer specifies an element to be added to the collection object being initialized, and
consists of a list of expressions enclosed by { and } tokens and separated by commas. ... The collection object to which a collection initializer is applied must be of a type that implements
System.Collections.IEnumerable or a compile-time error occurs. For each specified element in order, the
collection initializer invokes an Add method on the target object with the expression list of the element initializer as
argument list
And from 7.6.11.2 'Object Intializers':
An object initializer consists of a sequence of member initializers, enclosed by { and } tokens and separated by
commas. Each member_initializer designates a target for the initialization. An identifier must name an accessible
field or property of the object being initialized, whereas an argument_list enclosed in square brackets must specify
arguments for an accessible indexer on the object being initialized.
Take this as an example:
public class ItemWithIndexer
{
private readonly Dictionary<string, string> _dictionary =
new Dictionary<string, string>();
public string this[string index]
{
get { return _dictionary[index]; }
set { _dictionary[index] = value; }
}
}
Note that this class does not meet the requirements to have a collection initializer applied: it does not implement IEnumerable or have an Add method, so any attempt to initialize in this way would result in a compile-time error. This object initializer targeting the indexer will compile and work, however (see this fiddle):
var item = new ItemWithIndexer
{
["1"] = "value"
};
Related
I decompiled some C# 7 libraries and saw ValueTuple generics being used. What are ValueTuples and why not Tuple instead?
https://learn.microsoft.com/en-gb/dotnet/api/system.tuple
https://learn.microsoft.com/en-gb/dotnet/api/system.valuetuple
What are ValueTuples and why not Tuple instead?
A ValueTuple is a struct which reflects a tuple, same as the original System.Tuple class.
The main difference between Tuple and ValueTuple are:
System.ValueTuple is a value type (struct), while System.Tuple is a reference type (class). This is meaningful when talking about allocations and GC pressure.
System.ValueTuple isn't only a struct, it's a mutable one, and one has to be careful when using them as such. Think what happens when a class holds a System.ValueTuple as a field.
System.ValueTuple exposes its items via fields instead of properties.
Until C# 7, using tuples wasn't very convenient. Their field names are Item1, Item2, etc, and the language hadn't supplied syntax sugar for them like most other languages do (Python, Scala).
When the .NET language design team decided to incorporate tuples and add syntax sugar to them at the language level an important factor was performance. With ValueTuple being a value type, you can avoid GC pressure when using them because (as an implementation detail) they'll be allocated on the stack.
Additionally, a struct gets automatic (shallow) equality semantics by the runtime, where a class doesn't. Although the design team made sure there will be an even more optimized equality for tuples, hence implemented a custom equality for it.
Here is a paragraph from the design notes of Tuples:
Struct or Class:
As mentioned, I propose to make tuple types structs rather than
classes, so that no allocation penalty is associated with them. They
should be as lightweight as possible.
Arguably, structs can end up being more costly, because assignment
copies a bigger value. So if they are assigned a lot more than they
are created, then structs would be a bad choice.
In their very motivation, though, tuples are ephemeral. You would use
them when the parts are more important than the whole. So the common
pattern would be to construct, return and immediately deconstruct
them. In this situation structs are clearly preferable.
Structs also have a number of other benefits, which will become
obvious in the following.
Examples:
You can easily see that working with System.Tuple becomes ambiguous very quickly. For example, say we have a method which calculates a sum and a count of a List<Int>:
public Tuple<int, int> DoStuff(IEnumerable<int> values)
{
var sum = 0;
var count = 0;
foreach (var value in values) { sum += value; count++; }
return new Tuple(sum, count);
}
On the receiving end, we end up with:
Tuple<int, int> result = DoStuff(Enumerable.Range(0, 10));
// What is Item1 and what is Item2?
// Which one is the sum and which is the count?
Console.WriteLine(result.Item1);
Console.WriteLine(result.Item2);
The way you can deconstruct value tuples into named arguments is the real power of the feature:
public (int sum, int count) DoStuff(IEnumerable<int> values)
{
var res = (sum: 0, count: 0);
foreach (var value in values) { res.sum += value; res.count++; }
return res;
}
And on the receiving end:
var result = DoStuff(Enumerable.Range(0, 10));
Console.WriteLine($"Sum: {result.sum}, Count: {result.count}");
Or:
var (sum, count) = DoStuff(Enumerable.Range(0, 10));
Console.WriteLine($"Sum: {sum}, Count: {count}");
Compiler goodies:
If we look under the cover of our previous example, we can see exactly how the compiler is interpreting ValueTuple when we ask it to deconstruct:
[return: TupleElementNames(new string[] {
"sum",
"count"
})]
public ValueTuple<int, int> DoStuff(IEnumerable<int> values)
{
ValueTuple<int, int> result;
result..ctor(0, 0);
foreach (int current in values)
{
result.Item1 += current;
result.Item2++;
}
return result;
}
public void Foo()
{
ValueTuple<int, int> expr_0E = this.DoStuff(Enumerable.Range(0, 10));
int item = expr_0E.Item1;
int arg_1A_0 = expr_0E.Item2;
}
Internally, the compiled code utilizes Item1 and Item2, but all of this is abstracted away from us since we work with a decomposed tuple. A tuple with named arguments gets annotated with the TupleElementNamesAttribute. If we use a single fresh variable instead of decomposing, we get:
public void Foo()
{
ValueTuple<int, int> valueTuple = this.DoStuff(Enumerable.Range(0, 10));
Console.WriteLine(string.Format("Sum: {0}, Count: {1})", valueTuple.Item1, valueTuple.Item2));
}
Note that the compiler still has to make some magic happen (via the attribute) when we debug our application, as it would be odd to see Item1, Item2.
The difference between Tuple and ValueTuple is that Tuple is a reference type and ValueTuple is a value type. The latter is desirable because changes to the language in C# 7 have tuples being used much more frequently, but allocating a new object on the heap for every tuple is a performance concern, particularly when it's unnecessary.
However, in C# 7, the idea is that you never have to explicitly use either type because of the syntax sugar being added for tuple use. For example, in C# 6, if you wanted to use a tuple to return a value, you would have to do the following:
public Tuple<string, int> GetValues()
{
// ...
return new Tuple(stringVal, intVal);
}
var value = GetValues();
string s = value.Item1;
However, in C# 7, you can use this:
public (string, int) GetValues()
{
// ...
return (stringVal, intVal);
}
var value = GetValues();
string s = value.Item1;
You can even go a step further and give the values names:
public (string S, int I) GetValues()
{
// ...
return (stringVal, intVal);
}
var value = GetValues();
string s = value.S;
... Or deconstruct the tuple entirely:
public (string S, int I) GetValues()
{
// ...
return (stringVal, intVal);
}
var (S, I) = GetValues();
string s = S;
Tuples weren't often used in C# pre-7 because they were cumbersome and verbose, and only really used in cases where building a data class/struct for just a single instance of work would be more trouble than it was worth. But in C# 7, tuples have language-level support now, so using them is much cleaner and more useful.
I looked at the source for both Tuple and ValueTuple. The difference is that Tuple is a class and ValueTuple is a struct that implements IEquatable.
That means that Tuple == Tuple will return false if they are not the same instance, but ValueTuple == ValueTuple will return true if they are of the same type and Equals returns true for each of the values they contain.
In addition to the comments above, one unfortunate gotcha of ValueTuple is that, as a value type, the named arguments get erased when compiled to IL, so they're not available for serialisation at runtime.
i.e. Your sweet named arguments will still end up as "Item1", "Item2", etc. when serialised via e.g. Json.NET.
Other answers forgot to mention important points.Instead of rephrasing, I'm gonna reference the XML documentation from source code:
The ValueTuple types (from arity 0 to 8) comprise the runtime implementation that underlies
tuples in C# and struct tuples in F#.
Aside from created via language syntax, they are most easily created via the
ValueTuple.Create factory methods.
The System.ValueTuple types differ from the System.Tuple types in that:
they are structs rather than classes,
they are mutable rather than readonly, and
their members (such as Item1, Item2, etc) are fields rather than properties.
With introduction of this type and C# 7.0 compiler, you can easily write
(int, string) idAndName = (1, "John");
And return two values from a method:
private (int, string) GetIdAndName()
{
//.....
return (id, name);
}
Contrary to System.Tuple you can update its members (Mutable) because they are public read-write Fields that can be given meaningful names:
(int id, string name) idAndName = (1, "John");
idAndName.name = "New Name";
Late-joining to add a quick clarification on these two factoids:
they are structs rather than classes
they are mutable rather than readonly
One would think that changing value-tuples en-masse would be straightforward:
foreach (var x in listOfValueTuples) { x.Foo = 103; } // wont even compile because x is a value (struct) not a variable
var d = listOfValueTuples[0].Foo;
Someone might try to workaround this like so:
// initially *.Foo = 10 for all items
listOfValueTuples.Select(x => x.Foo = 103);
var d = listOfValueTuples[0].Foo; // 'd' should be 103 right? wrong! it is '10'
The reason for this quirky behavior is that the value-tuples are exactly value-based (structs) and thus the .Select(...) call works on cloned-structs rather than on the originals. To resolve this we must resort to:
// initially *.Foo = 10 for all items
listOfValueTuples = listOfValueTuples
.Select(x => {
x.Foo = 103;
return x;
})
.ToList();
var d = listOfValueTuples[0].Foo; // 'd' is now 103 indeed
Alternatively of course one might try the straightforward approach:
for (var i = 0; i < listOfValueTuples.Length; i++) {
listOfValueTuples[i].Foo = 103; //this works just fine
// another alternative approach:
//
// var x = listOfValueTuples[i];
// x.Foo = 103;
// listOfValueTuples[i] = x; //<-- vital for this alternative approach to work if you omit this changes wont be saved to the original list
}
var d = listOfValueTuples[0].Foo; // 'd' is now 103 indeed
Hope this helps someone struggling to make heads of tails out of list-hosted value-tuples.
After reading Meanings of declaring, instantiating, initializing and assigning an object it talks about what initializing a variable means. But it doesn't explain what "initializing" an instance of a class means.
public class Test
{
public static void Main()
{
Person person1 = new Person();
}
}
public class Person
{
// body
}
What does it mean to initialize an instance of a class?
Yeah, I don't like the "initialize" of the linked answer so much either, because it really only talks about giving a value to a single variable and doesn't really draw any distinctions between instantiation and assignment (the same lines of code are found in all of them) so for me it's a bit vague. We do have more specific processes (especially these days of modern c# syntax) when we talk about initialization
Initialize usually means "to give a created instance some initial values". Your class Person has nothing to initialize, so you could say that just by making it anew(instantiating) you've also done all the initialization possible and it's ready for use
Let's have something we can set values on
public class Person{
public string Name {get;set;}
public string Address {get;set;}
public Person(string name){
if(name == null) throw new ArgumentNullException(nameof(name));
Name = name;
}
}
Initializing as part of construction:
p = new Person("John");
Constructors force us to supply values and are used to ensure a developer gives a class the minimum set of data it needs to work.. a Person must have a name. Address is optional. We have created a person with the name initialized to John
Initializing post construction
You can give an instance additional (optional) values after you construct it, either like
p = new Person("Bill");
p.Address = "1 Microsoft Way";
Or
p = new Person("Bill"){
Address = "1 Microsoft Way"
}
Which is a syntactic sugar the compiler unrolls to something like the first. We refer to everything in the { } brackets of the second example as "an object initializer". An important distinction here though is that the first form (p.Address=...) is not considered to be initialization by the compiler. If you made the address property like:
public string Address {get;init;}
Then it can only be set in a constructor or in an object initializer, which is the latter form above. The p.Address=... form would result in a compiler error if the property were declared with init
Props set just after construction are part of the initialization process (as an English/linguistic thing) though I wouldn't call it init if it was any further down the line, such as
p = new Person("Sam");
string addr = Console.ReadLine();
p.Address = addr; //not initialization
You might find cases where people talk about initialization in the sense for "the first time a variable or property is given a value" but that's also more a linguistic/English thing than a c# thing
The compiler knows how to perform other initialization, so we also call things like this "an initializer":
string[] x = new string[] {"a","b","c"};
The process of giving the array those 3 values is initialization, and the compiler will even take the type of the first argument and use it to make the array type, so an array can be type declared and ignited from the data:
var x = new[] {"a","b","c"};
Having program with such code :
var subtree = new Tree<int>(5, EnumeratorOrder.BreadthFirstSearch) { 1, 2 };
var tree = new Tree<int>(7, EnumeratorOrder.BreadthFirstSearch) { subtree, 10, 15 };
I сan't understan what means { 1, 2 }?
I сan't understan what means { 1, 2 }
The {1, 2} are Collection Initializers.
They represent a short-hand version of
var temp = new Tree<int>(5, EnumeratorOrder.BreadthFirstSearch);
temp.Add(1);
temp.Add(2);
var subtree = temp;
Note regarding initial assignment to temp: The meaning of assignment is evaluate the left, evaluate the right, do the assignment. Evaluating the right produces side effects, and those effects must be ordered before the effect of the assignment. See comments for a full discussion.
It's a collection initializer.
Collection initializers let you specify one or more element initializers when you initialize a collection class that implements IEnumerable or a class with an Add extension method. The element initializers can be a simple value, an expression or an object initializer. By using a collection initializer you do not have to specify multiple calls to the Add method of the class in your source code; the compiler adds the calls.
I did an experiment in C#, first I created a class library called "ClassLibrary1", with code below:
public class ClassLibrary1
{
public static void f()
{
var m = new { m_s = "abc", m_l = 2L };
Console.WriteLine(m.GetType());
}
}
Note, I removed namespace information generated by IDE.
Then I created console application with code below:(also removed namespace)
while referring to ClassLibrary1:
class Program
{
static void Main()
{
var m = new {m_s = "xyz", m_l = 5L};
Console.WriteLine(m.GetType());
ClassLibrary1.f();
}
}
I run the program, it prints:
<>f__AnonymousType0`2[System.String,System.Int64]
<>f__AnonymousType0`2[System.String,System.Int64]
Press any key to continue . . .
The output indicates that the 2 anonymous classes defined in class library and console application are having identical class type.
My question is: how does C# binary store its type information for all the classes it contains? If it's stored in a global place, when the exe is built with dll reference, 2 same anonymous type information is there, so
(1) Is name duplication an error that should be avoid?
(2) If not an error like I tested, how could C# binary store duplicate type information?
(3) And in runtime, what's the rule to look up type information to create real objects?
Seems a bit confusing in my example.
Thanks.
I removed namespace information
Irrelevant. Anonymous types for an assembly are generated in the same namespace, namely an empty one.
Furthermore, see C# specs 7.6.10.6 Anonymous object creation expressions:
Within the same program, two anonymous object initializers that specify a sequence of properties of the same names and compile-time types in the same order will produce instances of the same anonymous type.
Confusingly, "program" here means "assembly". So:
how does C# binary store its type information for all the classes it contains? If it's stored in a global place, when the exe is built with dll reference, 2 same anonymous type information is there
That's right, but the types are unique per assembly. They can have the same type name, because they're in a different assembly. You can see that by printing m.GetType().AssemblyQualifiedName, which will include the assembly name.
It is possible to have duplicate names in the .NET assembly, because metadata items (classes, fields, properties etc) are referenced internally by numeric metadata token, not by the name
Although the use of duplicate names is restricted in ECMA-335 (except several special cases), this possibility is exploited by a number of obfuscators, and, probably, by the compilers in cases when the name of the metadata item (class in your case) is not directly exposed to the user code
EDIT: CodeCaster is right with his answer, the names reside in different assemblies in your case, hence the duplicate names. Though I believe my point with having duplicate names in the same assembly is valid, but may not be applicable to this particular question.
(Note, I'm using the reversed prime ‵ character here where the grave accent character is in the code, since that has a special meaning in markdown, and they look similar. This may not work on all browsers).
My question is: how does C# binary store its type information for all the classes it contains?
The same way it stores any other class. There is no such thing as an anonymous type in .NET, it's something that the C# (and other .NET languages) provide by compiling to what at the CIL level is a perfectly normal class with a perfectly normal name; because at the CIL level there's nothing special about the name <>f__AnonymousType‵2[System.String,System.Int64] though its being an illegal name in C#, VB.NET and many other languages has the advantage of avoiding direct use that would be inappropriate.
If it's stored in a global place, when the exe is built with dll reference, 2 same anonymous type information is there.
Try changing your Console.WriteLine(m.GetType()) to Console.WriteLine(m.GetType().AssemblyQualifiedName) and you'll see that they aren't the same type.
Is name duplication an error that should be avoid?
No, because CIL produced uses the AssemblyQualifiedName if it deals with classes from other assemblies.
If not an error like I tested, how could C# binary store duplicate type information?
The error was not in what you looked at, but in how you looked at it. There is no duplication.
And in runtime, what's the rule to look up type information to create real objects?
The type gets compiled directly into the calls, with the lookup happening at that point. Consider your f():
public static void f()
{
var m = new { m_s = "abc", m_l = 2L };
Console.WriteLine(m.GetType());
}
That is compiled to two things. The first is the anonymous type here goes into a list of definitions of anonymous types in the assembly, and they are all compiled into the equivalent of:
internal class SomeImpossibleName<M_SType, M_LType>
{
private readonly M_SType _m_s;
private readonly M_LType _m_l;
public SomeImpossibleName(M_SType s, M_LType l)
{
_m_s = s;
_m_l = l;
}
public M_SType m_s
{
get { return _m_s; }
}
public M_LType m_l
{
get { return _m_l; }
}
public override bool Equals(object value)
{
var compareWith = value as SomeImpossibleName<M_SType, M_LType>;
if(compareWith == null)
return false;
if(!EqualityComparer<M_SType>.Default.Equals(_m_s, compareWith._m_s))
return false;
return EqualityComparer<M_LType>.Default.Equals(_m_l, compareWith._m_l);
}
public override int GetHashCode()
{
unchecked
{
return (-143687205 * -1521134295 + EqualityComparer<M_SType>.Default.GetHashCode(_m_s))
* 1521134295 + EqualityComparer<M_LType>.Default.GetHashCode(_m_l);
}
}
public override string ToString()
{
return new StringBuilder().Append("{ m_s = ")
.Append((object)_m_s)
.Append(", m_l = ")
.Append((object)_m_l)
.Append(" }")
.ToString();
}
}
Some things to note here:
This uses a generic type, to save on the compiled size if you had a bunch of different classes with an m_s followed by an m_l of different types.
This allows for a simple but reasonable comparison between objects of the same type, without which GroupBy and Distinct would not work.
I called this SomeImpossibleName<M_SType, M_LType> the real name would be <>f__AnonymousType0<<m_s>j__TPar, <m_l>j__TPar>>. That is, not only is the main part of the name impossible in C#, but so are the names of the type parameters.
If you have two methods that each do new Something{ m_s = "abc", m_l = 2L } they will both use this type.
The constructor is optimised. While in C# generally calling var x = new Something{ m_s = "abc", m_l = 2L } is the same as calling var x = new Something; x.m_s = "abc"; x.m_l = 2L; the code created for doing so with an anonymous type is actually the equivalent to var x = new Something("abc", 2L). This both gives a performance benefit but more importantly allows anonymous types to be immutable even though the form of constructor used only works with named types if they are mutable.
Also the following CIL for the method:
.method public hidebysig static void f () cil managed
{
.maxstack 2
.locals init
(
[0] class '<>f__AnonymousType0`2'<string, int64>
)
// Push the string "abc" onto the stack.
ldstr "abc"
// Push the number 2 onto the stack as an int32
ldc.i4.2
// Pop the top value from the stack, convert it to an int64 and push that onto the stack.
conv.i8
// Allocate a new object can call the <>f__AnonymousType0`2'<string, int64> constructor.
// (This call will make use of the string and long because that's how the constructor is defined
newobj instance void class '<>f__AnonymousType0`2'<string, int64>::.ctor(!0, !1)
// Store the object in the locals array, and then take it out again.
// (Yes, this is a waste of time, but it often isn't and so the compiler sometimes adds in these
// stores).
stloc.0
ldloc.0
// Call GetType() which will pop the current value off the stack (the object) and push on
// The result of GetType()
callvirt instance class [mscorlib]System.Type [mscorlib]System.Object::GetType()
// Call WriteLine, which is a static method, so it doesn't need a System.Console item
// on the stack, but which takes an object parameter from the stack.
call void [mscorlib]System.Console::WriteLine(object)
// Return
ret
}
Now, some things to note here. Notice how all the calls to methods defined in the mscorlib assembly. All calls across assemblies use this. So too do all uses of classes across assemblies. As such if two assemblies both have a <>f__AnonymousType0‵2 class, they will not cause a collision: Internal calls would use <>f__AnonymousType0‵2 and calls to the other assembly would use [Some.Assembly.Name]<>f__AnonymousType0‵2 so there is no collision.
The other thing to note is the newobj instance void class '<>f__AnonymousType0‵2'<string, int64>::.ctor(!0, !1) which is the answer to your question, "And in runtime, what's the rule to look up type information to create real objects?". It isn't looked up at runtime at all, but the call to the relevant constructor is determined at compile time.
Conversely, there's nothing to stop you from having non-anonymous types with the exact same name in different assemblies. Add an explicit reference to mscorlib to a console application project and change its alias from the default global to global, mscrolib and then try this:
namespace System.Collections.Generic
{
extern alias mscorlib;
public class List<T>
{
public string Count
{
get{ return "This is a very strange “Count”, isn’t it?"; }
}
}
class Program
{
public static void Main(string[] args)
{
var myList = new System.Collections.Generic.List<int>();
var theirList = new mscorlib::System.Collections.Generic.List<int>();
Console.WriteLine(myList.Count);
Console.WriteLine(theirList.Count);
Console.Read();
}
}
}
While there's a collision on the name System.Collections.Generic.List, the use of extern alias allows us to specify which assembly the compiler should look in for it, so we can use both versions side by side. Of course we wouldn't want to do this and its a lot of hassle and confusion, but compilers don't get hassled or confused in the same way.
I have a field that is of type 'object'. When I inspect it within the Watch window of visual studio I see its Object[] and when I drill into the elements I see each element is a string.
But when I try to cast this to a String[] I get this error:
Cannot cast 'MyObject' (which has an actual type of 'object[]') to 'string[]' string[]
Any reason why I can't do this cast? What is the best way to convert this object to a string array?
This is a particularly confusing feature of C#. Here's the deal.
Throughout this discussion we assume that the element type of an array is a reference type, not a value type.
C# supports unsafe array covariance. That means that if you have an array of string, you can convert it to an array of object, because a string can be converted to an object:
string[] a1 = { "hello", "goodbye" };
object[] a2 = a1; // Legal
If you then try to get an element out of a2, it works:
object o3 = a2[0];
That's legal because a2[0] is really a1[0], which is a string, which is convertible to object.
However, if you attempt to write to the array then you'll get an error at runtime:
a2[0] = new object();
This fails at runtime because a2 is really an array of strings, and you can't put a non-string into an array of strings.
So C# is already horribly broken; it is possible to write a program that compiles and looks normal but suddenly crashes with a type exception at runtime because you tried to put an object into an array of objects that is not actually an array of objects.
The feature you want is even more broken than that, and thank goodness C# does not support it. The feature you want is:
object[] a4 = { "Hello" };
string[] a5 = a4;
That would be unsafe array contravariance. It breaks horribly like this:
a4[0] = new Customer(); // Perfectly legal
string s6 = a5[0];
And now we just copied a Customer into a variable of type string.
You should avoid any kind of array covariance or contravariance; array contravariance is, as you've discovered, not legal, and array covariance is making little time bombs in your program that go off unexpectedly. Make your arrays of the right type to begin with.
string[] newarr = Array.ConvertAll(objects, s => (string)s);
--EDIT--
since you've said I have an object (knowing that it is an object[] actually)
string[] newarr = Array.ConvertAll((object[])objects, s => (string)s);
object[] original = new object[]{"1", "2"};
//some code in between here
object obj = original ;
object[] objArray = (object[])obj;
string[] newArray = new string[objArray.Length];
for(int i = 0; i < newArray; i++)
{
newArray[i] = (string)objArray[i];
}
Other answers here are showing you quicker/shorter ways of doing the conversion. I wrote the whole thing out like this because it shows what's really going on and what needs to happen. You should use one of the simpler methods in your actual production code.
The rule in object oriented programming is -
"Derived class can always be type casted to base class" AND
"A Base class can be casted to derived class only if the current instance that base class hold off is actually derived class"
e.g. (A is base and B is derived)
A a = new B(); // legal;
B b = (B) a ; // legal as "a" is actually B (in first statement)
illegal : >
A a = new A();
B b = (B) a; // not legal as "a" is A only.
Same thing is applied to Object and String classes. Object is base class and string is Derived class.
You can convert the real string[] to object[].
This is a Array covariance
Can find a clear example in link.
You should cast each element in the collection and not the collection itself.
object[] ovalues = new object[] { "alpha", "beta" };
string[] svalues = ovalues.Cast<string>().ToArray();