C# Array with specific LowerBound [duplicate] - c#

Although perhaps a bizare thing to want to do, I need to create an Array in .Net with a lower bound > 0. This at first seems to be possible, using:
Array.CreateInstance(typeof(Object), new int[] {2}, new int[] {9});
Produces the desired results (an array of objects with a lower bound set to 9). However the created array instance can no longer be passed to other methods expecting Object[] giving me an error saying that:
System.Object[*] can not be cast into a System.Object[]. What is this difference in array types and how can I overcome this?
Edit: test code =
Object x = Array.CreateInstance(typeof(Object), new int[] {2}, new int[] {9});
Object[] y = (Object[])x;
Which fails with: "Unable to cast object of type 'System.Object[*]' to type 'System.Object[]'."
I would also like to note that this approach DOES work when using multiple dimensions:
Object x = Array.CreateInstance(typeof(Object), new int[] {2,2}, new int[] {9,9});
Object[,] y = (Object[,])x;
Which works fine.

The reason why you can't cast from one to the other is that this is evil.
Lets say you create an array of object[5..9] and you pass it to a function F as an object[].
How would the function knows that this is a 5..9 ? F is expecting a general array but it's getting a constrained one. You could say it's possible for it to know, but this is still unexpected and people don't want to make all sort of boundary checks everytime they want to use a simple array.
An array is the simplest structure in programming, making it too complicated makes it unsusable. You probably need another structure.
What you chould do is a class that is a constrained collection that mimics the behaviour you want. That way, all users of that class will know what to expect.
class ConstrainedArray<T> : IEnumerable<T> where T : new()
{
public ConstrainedArray(int min, int max)
{
array = new T[max - min];
}
public T this [int index]
{
get { return array[index - Min]; }
set { array[index - Min] = value; }
}
public int Min {get; private set;}
public int Max {get; private set;}
T[] array;
public IEnumerator<T> GetEnumerator()
{
return array.GetEnumarator();
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return array.GetEnumarator();
}
}

I'm not sure about why that can't be passed as Object[], but wouldn't be easy if you just create a real class to wrap an array and handle your "weird logic" in there?
You'd get the benefits of using a real reference object were you could add "intelligence" to your class.
Edit: How are you casting your Array, could you post some more code? Thanks.

Just store your lower bound in a const offset integer, and subtract that value from whatever your source returns as the index.
Also: this is an old VB6 feature. I think there might be an attribute to help support it.

The .NET CLR differentiates between two internal array object formats: SZ arrays and MZ arrays. MZ arrays can be multi-dimensional and store their lower bounds in the object.
The reason for this difference is two-fold:
Efficient code generation for single-dimensional arrays requires that there is no lower bound. Having a lower bound is incredibly uncommon. We would not want to sacrifice significant performance in the common case for this rarely used feature.
Most code expects arrays with zero lower bound. We certainly don't want to pollute all of our code with checking the lower bound or adjusting loop bounds.
These concerns are solved by making a separate CLR type for SZ arrays. This is the type that almost all practically occurring arrays are using.

Know it's old question, but to fully explain it.
If type (in this case a single-dimension array with lower bound > 0) can't be created by typed code, simply reflected type instance can't be consumed by typed code then.
What you have noticed is already in documentation:
https://learn.microsoft.com/en-us/dotnet/framework/reflection-and-codedom/specifying-fully-qualified-type-names
Note that from a runtime point of view, MyArray[] != MyArray[*], but
for multidimensional arrays, the two notations are equivalent. That
is, Type.GetType("MyArray [,]") == Type.GetType("MyArray[*,*]")
evaluates to true.
In c#/vb/... you can keep that reflected array in object, pass around as object, and use only reflection to access it's items.
-
Now you ask "why is there LowerBound at all?", well COM object aren't .NET, it could be written in old VB6 that actually had array object that has LowerBound set to 1 (or anything VB6 had such freedom or curse, depends whom you ask). To access first element of such object you would actually need to use 'comObject(1)' instead of 'comObject(0)'. So the reason to check lower bound is when you are performing enumeration of such object to know where to start enumeration, since element functions in COM object expects first element to be of LowerBound value, and not Zero (0), it was reasonable to support same logic on such instances. Imagine your get element value of first element at 0, and use some Com object to pass such element instance with index value of 1 or even with index value of 2001 to a method, code would be very confusing.
To put it simply: it's mostly for legacy support only!

Related

Why Can't I Specify The Size Of The Array Returned From A C# Function?

I know that the following C# code will not compile:
int[10] TestFixedArrayReturn(int n)
{
return new int[10]{n, n, n, n, n, n, n, n, n, n};
}
void TestCall()
{
int[10] result = TestFixedArrayReturn(1);
}
In order to get it to compile, I need to remove the array size from the function declaration (as well as the declaration of the result variable) like so:
int[] TestFixedArrayReturn(int n)
{
return new int[10]{n, n, n, n, n, n, n, n, n, n};
}
void TestCall()
{
int[] result = TestFixedArrayReturn(1);
}
I'm just wondering--why is that I cannot specify the size of the array of ints which will get returned? I take it what's getting passed back is actually a reference to the array (that'd be my guess anyway) but why can't I specify the size of the array being returned? Wouldn't this allow the compiler to check my code more closely for correctness?
The simple answer I think, is that in .net:
Functions return instances of a type
Therefore, functions are declared as having a return type
Arrays are a single type (Class) in .net, so that they can interoperate
Conversely, however, this means that arrays of different fixed sizes do not have different Types, just different instance attributes.
Which means that functions cannot be defined to return arrays of a specific fixed size because the length of an array is not part of its type specification, but rather part of its instance settings/attributes.
For variables, this seems like it can be done in the type declaration of variables, but it is actually part of the instance initialization specs there, even though syntactically it looks like it’s part of the type declaration. This was probably retained for style compatibility reasons with older languages like C, where things worked much differently under the hood.
For functions, however, the return object is not initialized where it is declared (in the function declaration) but rather procedurally in the body of the function itself. So allowing instance attributes to be set in the function declaration could have caused all kinds of conflicts and edge cases that would need to be checked either by the compiler or at run-time, neither of which was probably seen as being worth the very minimal gain.
An array is an array... it doesnt have a size unless you initialize it. When you are passing a variable from a method to another method, you define the type not its structure as in the size of the array.
How would the compiler know at compile time what the size of the array being returned would be? To work this out it would have to execute all the code in your method...

How to get the elements of a System.Numerics.Vector in C#?

I want to access the elements of a System.Numerics.Vector<T> in C#.
I'm following the official documentation: https://learn.microsoft.com/en-us/dotnet/api/system.numerics.vector-1?view=netcore-2.2
I'm able to create different vectors with different datatypes.
For example: var test = new Vector<double>(new double[] { 1.0, 2.0, 1.0 });
But now I have the problem, that I'm unable to call test.Count; it is not possible to call Count on an instance of type System.Numerics.Vector<T>.
I can access single elements with the access operator [], but I dont know how many elements are in the vector.
According to the documentation, there should be the public property:
public static int Count { get; }
But I can not call in on my instance of System.Numerics.Vector<T>. Instead I can call it only in a static manner like following:
Vector<double>.Count
This is equal to 2.
I can also call:
Vector<Int32>.Count
returning: 4 and
Vector<Int16>.Count
returning 8.
And now I'm really a bit confused, about how to use this static property. At first, I thought, that this property would return the number of elements stored in the vector (as stated in the documentation). Second, I thought, that this property returns the size of the vector in memory, but this number increases from double to Int32 to Int16.
Interestingly I can not call this static property from my instance created by:
var test = new Vector<double>(new double[] { 1.0, 2.0, 1.0 });
I can not call test.Count!
Do you know how to access the elements of System.Numerics.Vector<T>?
There is no way to do that. Vector<T> is of a fixed size, since it's trying to optimize for hardware acceleration. The docs state:
The count of a Vector<T> instance is fixed, but its upper limit is CPU-register dependent. It is intended to be used as a building block for vectorizing large algorithms.
Reading the source at https://source.dot.net/#System.Private.CoreLib/shared/System/Numerics/Vector.cs
Shows that it will throw if less data is passed in that's required and will only take up to Count of items passed in.
Vector.Count property only shows who much elements of concrete type it can fit. And that why it's value only increase from double to short int. You can only fit 16 byte in there, therefore 2 double var, 4 int var, 8 short int var and so on.

Is there a List<T> like dynamic array that allows access to the internal array data in .NET?

Looking over the source of List<T>, it seems that there's no good way to access the private _items array of items.
What I need is basically a dynamic list of structs, which I can then modify in place. From my understanding, because C# 6 doesn't yet support ref return types, you can't have a List<T> return a reference to an element, which requires copying of the whole item, for example:
struct A {
public int X;
}
void Foo() {
var list = new List<A> { new A { X = 3; } };
list[0].X++; // this fails to compile, because the indexer returns a copy
// a proper way to do this would be
var copy = list[0];
copy.X++;
list[0] = copy;
var array = new A[] { new A { X = 3; } };
array[0].X++; // this works just fine
}
Looking at this, it's both clunky from syntax point of view, and possibly much slower than modifying the data in place (Unless the JIT can do some magic optimizations for this specific case? But I doubt they could be relied on in the general case, unless it's a special standardized optimization?)
Now if List<T>._items was protected, one could at least subclass List<T> and create a data structure with specific modify operations available. Is there another data structure in .NET that allows this, or do I have to implement my own dynamic array?
EDIT: I do not want any form of boxing or introducing any form of reference semantics. This code is intended for very high performance, and the reason I'm using an array of structs is to have them tighly packed on memory (and not everywhere around heap, resulting in cache misses).
I want to modify the structs in place because it's part of a performance critical algorithm that stores some of it's data in those structs.
Is there another data structure in .NET that allows this, or do I have to implement my own dynamic array?
Neither.
There isn't, and can't be, a data structure in .NET that avoids the structure copy, because deep integration with the C# language is needed to get around the "indexed getter makes a copy" issue. So you're right to think in terms of directly accessing the array.
But you don't have to build your own dynamic array from scratch. Many List<T>-like operations such as Resize and bulk movement of items are provided for you as static methods on type System.Array. They come in generic flavors, so no boxing is involved.
The unfortunate thing is that the high-performance Buffer.BlockCopy, which should work on any blittable type, actually contains a hard-coded check for primitive types and refuses to work on any structure.
So just go with T[] (plus int Count -- array length isn't good enough because trying to keep capacity equal to count is very inefficient) and use System.Array static methods when you would otherwise use methods of List<T>. If you wrap this as a PublicList<T> class, you can get reusability and both the convenience of methods for Add, Insert, Sort as well as direct element access by indexing directly on the array. Just exercise some restraint and never store the handle to the internal array, because it will become out-of-date the next time the list needs to grow its capacity. Immediate direct access is perfectly fine though.

Storing a C# reference to an array of structs and retrieving it - possible without copying?

UPDATE: the next version of C# has a feature under consideration that would directly answer this issue. c.f. answers below.
Requirements:
App data is stored in arrays-of-structs. There is one AoS for each type of data in the app (e.g. one for MyStruct1, another for MyStruct2, etc)
The structs are created at runtime; the more code we write in the app, the more there will be.
I need one class to hold references to ALL the AoS's, and allow me to set and get individual structs within those AoS's
The AoS's tend to be large (1,000's of structs per array); copying those AoS's around would be a total fail - they should never be copied! (they never need to!)
I have code that compiles and runs, and it works ... but is C# silently copying the AoS's under the hood every time I access them? (see below for full source)
public Dictionary<System.Type, System.Array> structArraysByType;
public void registerStruct<T>()
{
System.Type newType = typeof(T);
if( ! structArraysByType.ContainsKey(newType ) )
{
structArraysByType.Add(newType, new T[1000] ); // allowing up to 1k
}
}
public T get<T>( int index )
{
return ((T[])structArraysByType[typeof(T)])[index];
}
public void set<T>( int index, T newValue )
{
((T[])structArraysByType[typeof(T)])[index] = newValue;
}
Notes:
I need to ensure C# sees this as an array of value-types, instead of an array of objects ("don't you DARE go making an array of boxed objects around my structs!"). As I understand it: Generic T[] ensures that (as expected)
I couldn't figure out how to express the type "this will be an array of structs, but I can't tell you which structs at compile time" other than System.Array. System.Array works -- but maybe there are alternatives?
In order to index the resulting array, I have to typecast back to T[]. I am scared that this typecast MIGHT be boxing the Array-of-Structs; I know that if it were (T) instead of (T[]), it would definitely box; hopefully it doesn't do that with T[] ?
Alternatively, I can use the System.Array methods, which definitely boxes the incoming and outgoing struct. This is a fairly major problem (although I could workaround it if were the only way to make C# work with Array-of-struct)
As far as I can see, what you are doing should work fine, but yes it will return a copy of a struct T instance when you call Get, and perform a replacement using a stack based instance when you call Set. Unless your structs are huge, this should not be a problem.
If they are huge and you want to
Read (some) properties of one of a struct instance in your array without creating a copy of it.
Update some of it's fields (and your structs are not supposed to be immutable, which is generally a bad idea, but there are good reasons for doing it)
then you can add the following to your class:
public delegate void Accessor<T>(ref T item) where T : struct;
public delegate TResult Projector<T, TResult>(ref T item) where T : struct;
public void Access<T>(int index, Accessor<T> accessor)
{
var array = (T[])structArraysByType[typeof(T)];
accessor(ref array[index]);
}
public TResult Project<T, TResult>(int index, Projector<T, TResult> projector)
{
var array = (T[])structArraysByType[typeof(T)];
return projector(ref array[index]);
}
Or simply return a reference to the underlying array itself, if you don't need to abstract it / hide the fact that your class encapsulates them:
public T[] GetArray<T>()
{
return (T[])structArraysByType[typeof(T)];
}
From which you can then simply access the elements:
var myThingsArray = MyStructArraysType.GetArray<MyThing>();
var someFieldValue = myThingsArray[10].SomeField;
myThingsArray[3].AnotherField = "Hello";
Alternatively, if there is no specific reason for them to be structs (i.e. to ensure sequential cache friendly fast access), you might want to simply use classes.
There is a much better solution that is planned for adding to next version of C#, but does not yet exist in C# - the "return ref" feature of .NET already exists, but isn't supported by the C# compiler.
Here's the Issue for tracking that feature: https://github.com/dotnet/roslyn/issues/118
With that, the entire problem becomes trivial "return ref the result".
(answer added for future, when the existing answer will become outdated (I hope), and because there's still time to comment on that proposal / add to it / improve it!)

Are array/list lengths and string hashcodes memoized?

Does anyone happen to know if, in .NET (4.0, if it matters), the Length property of an array, or the Count property of a List<T> is stored after it's been calculated, until the array/list is changed?
I ask because a record linkage program I'm working on is already horrifically complex enough as it is, and I'd rather not add another O(n) on top of things if I can help it.
Along similar lines, is the hashcode of an instantiated System.String memoized? Looking at it through the debugger, I can see that List<T> has a private _size member that could be where Count gets its value, but I see nothing for int[] or string indicating that they store it anywhere.
I can see the size vs. speed tradeoff, but can anyone tell me for sure if there are backing fields that hold these? For example, since in C#, strings are immutable, wouldn't it make sense to calculate the hashcode the first time GetHashCode is called, and just store it for every later use?
From http://msdn.microsoft.com/en-us/library/6sh2ey19(v=VS.100).aspx
The List class is the generic equivalent of the ArrayList class. It
implements the IList generic interface using an array whose size is
dynamically increased as required.
This would suggest, the Length property is known at any point in time and need not be calculated.
Using reflector the implementation of List<T>.Count is:
public int Count
{
get
{
return this._size;
}
}
I'm not sure about Array.Length. The code for it is:
public int Length
{
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.Success),
SecuritySafeCritical, MethodImpl(MethodImplOptions.InternalCall)]
get;
}
From the .NET Reference Source Code, I looked at List. It holds a variable _size. Therefore whenever an item is added or removed it increments or decrements the size. So it is not really calculated and is always stored in a variable.

Categories

Resources