Does anyone happen to know if, in .NET (4.0, if it matters), the Length property of an array, or the Count property of a List<T> is stored after it's been calculated, until the array/list is changed?
I ask because a record linkage program I'm working on is already horrifically complex enough as it is, and I'd rather not add another O(n) on top of things if I can help it.
Along similar lines, is the hashcode of an instantiated System.String memoized? Looking at it through the debugger, I can see that List<T> has a private _size member that could be where Count gets its value, but I see nothing for int[] or string indicating that they store it anywhere.
I can see the size vs. speed tradeoff, but can anyone tell me for sure if there are backing fields that hold these? For example, since in C#, strings are immutable, wouldn't it make sense to calculate the hashcode the first time GetHashCode is called, and just store it for every later use?
From http://msdn.microsoft.com/en-us/library/6sh2ey19(v=VS.100).aspx
The List class is the generic equivalent of the ArrayList class. It
implements the IList generic interface using an array whose size is
dynamically increased as required.
This would suggest, the Length property is known at any point in time and need not be calculated.
Using reflector the implementation of List<T>.Count is:
public int Count
{
get
{
return this._size;
}
}
I'm not sure about Array.Length. The code for it is:
public int Length
{
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.Success),
SecuritySafeCritical, MethodImpl(MethodImplOptions.InternalCall)]
get;
}
From the .NET Reference Source Code, I looked at List. It holds a variable _size. Therefore whenever an item is added or removed it increments or decrements the size. So it is not really calculated and is always stored in a variable.
Related
Although perhaps a bizare thing to want to do, I need to create an Array in .Net with a lower bound > 0. This at first seems to be possible, using:
Array.CreateInstance(typeof(Object), new int[] {2}, new int[] {9});
Produces the desired results (an array of objects with a lower bound set to 9). However the created array instance can no longer be passed to other methods expecting Object[] giving me an error saying that:
System.Object[*] can not be cast into a System.Object[]. What is this difference in array types and how can I overcome this?
Edit: test code =
Object x = Array.CreateInstance(typeof(Object), new int[] {2}, new int[] {9});
Object[] y = (Object[])x;
Which fails with: "Unable to cast object of type 'System.Object[*]' to type 'System.Object[]'."
I would also like to note that this approach DOES work when using multiple dimensions:
Object x = Array.CreateInstance(typeof(Object), new int[] {2,2}, new int[] {9,9});
Object[,] y = (Object[,])x;
Which works fine.
The reason why you can't cast from one to the other is that this is evil.
Lets say you create an array of object[5..9] and you pass it to a function F as an object[].
How would the function knows that this is a 5..9 ? F is expecting a general array but it's getting a constrained one. You could say it's possible for it to know, but this is still unexpected and people don't want to make all sort of boundary checks everytime they want to use a simple array.
An array is the simplest structure in programming, making it too complicated makes it unsusable. You probably need another structure.
What you chould do is a class that is a constrained collection that mimics the behaviour you want. That way, all users of that class will know what to expect.
class ConstrainedArray<T> : IEnumerable<T> where T : new()
{
public ConstrainedArray(int min, int max)
{
array = new T[max - min];
}
public T this [int index]
{
get { return array[index - Min]; }
set { array[index - Min] = value; }
}
public int Min {get; private set;}
public int Max {get; private set;}
T[] array;
public IEnumerator<T> GetEnumerator()
{
return array.GetEnumarator();
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return array.GetEnumarator();
}
}
I'm not sure about why that can't be passed as Object[], but wouldn't be easy if you just create a real class to wrap an array and handle your "weird logic" in there?
You'd get the benefits of using a real reference object were you could add "intelligence" to your class.
Edit: How are you casting your Array, could you post some more code? Thanks.
Just store your lower bound in a const offset integer, and subtract that value from whatever your source returns as the index.
Also: this is an old VB6 feature. I think there might be an attribute to help support it.
The .NET CLR differentiates between two internal array object formats: SZ arrays and MZ arrays. MZ arrays can be multi-dimensional and store their lower bounds in the object.
The reason for this difference is two-fold:
Efficient code generation for single-dimensional arrays requires that there is no lower bound. Having a lower bound is incredibly uncommon. We would not want to sacrifice significant performance in the common case for this rarely used feature.
Most code expects arrays with zero lower bound. We certainly don't want to pollute all of our code with checking the lower bound or adjusting loop bounds.
These concerns are solved by making a separate CLR type for SZ arrays. This is the type that almost all practically occurring arrays are using.
Know it's old question, but to fully explain it.
If type (in this case a single-dimension array with lower bound > 0) can't be created by typed code, simply reflected type instance can't be consumed by typed code then.
What you have noticed is already in documentation:
https://learn.microsoft.com/en-us/dotnet/framework/reflection-and-codedom/specifying-fully-qualified-type-names
Note that from a runtime point of view, MyArray[] != MyArray[*], but
for multidimensional arrays, the two notations are equivalent. That
is, Type.GetType("MyArray [,]") == Type.GetType("MyArray[*,*]")
evaluates to true.
In c#/vb/... you can keep that reflected array in object, pass around as object, and use only reflection to access it's items.
-
Now you ask "why is there LowerBound at all?", well COM object aren't .NET, it could be written in old VB6 that actually had array object that has LowerBound set to 1 (or anything VB6 had such freedom or curse, depends whom you ask). To access first element of such object you would actually need to use 'comObject(1)' instead of 'comObject(0)'. So the reason to check lower bound is when you are performing enumeration of such object to know where to start enumeration, since element functions in COM object expects first element to be of LowerBound value, and not Zero (0), it was reasonable to support same logic on such instances. Imagine your get element value of first element at 0, and use some Com object to pass such element instance with index value of 1 or even with index value of 2001 to a method, code would be very confusing.
To put it simply: it's mostly for legacy support only!
Looking over the source of List<T>, it seems that there's no good way to access the private _items array of items.
What I need is basically a dynamic list of structs, which I can then modify in place. From my understanding, because C# 6 doesn't yet support ref return types, you can't have a List<T> return a reference to an element, which requires copying of the whole item, for example:
struct A {
public int X;
}
void Foo() {
var list = new List<A> { new A { X = 3; } };
list[0].X++; // this fails to compile, because the indexer returns a copy
// a proper way to do this would be
var copy = list[0];
copy.X++;
list[0] = copy;
var array = new A[] { new A { X = 3; } };
array[0].X++; // this works just fine
}
Looking at this, it's both clunky from syntax point of view, and possibly much slower than modifying the data in place (Unless the JIT can do some magic optimizations for this specific case? But I doubt they could be relied on in the general case, unless it's a special standardized optimization?)
Now if List<T>._items was protected, one could at least subclass List<T> and create a data structure with specific modify operations available. Is there another data structure in .NET that allows this, or do I have to implement my own dynamic array?
EDIT: I do not want any form of boxing or introducing any form of reference semantics. This code is intended for very high performance, and the reason I'm using an array of structs is to have them tighly packed on memory (and not everywhere around heap, resulting in cache misses).
I want to modify the structs in place because it's part of a performance critical algorithm that stores some of it's data in those structs.
Is there another data structure in .NET that allows this, or do I have to implement my own dynamic array?
Neither.
There isn't, and can't be, a data structure in .NET that avoids the structure copy, because deep integration with the C# language is needed to get around the "indexed getter makes a copy" issue. So you're right to think in terms of directly accessing the array.
But you don't have to build your own dynamic array from scratch. Many List<T>-like operations such as Resize and bulk movement of items are provided for you as static methods on type System.Array. They come in generic flavors, so no boxing is involved.
The unfortunate thing is that the high-performance Buffer.BlockCopy, which should work on any blittable type, actually contains a hard-coded check for primitive types and refuses to work on any structure.
So just go with T[] (plus int Count -- array length isn't good enough because trying to keep capacity equal to count is very inefficient) and use System.Array static methods when you would otherwise use methods of List<T>. If you wrap this as a PublicList<T> class, you can get reusability and both the convenience of methods for Add, Insert, Sort as well as direct element access by indexing directly on the array. Just exercise some restraint and never store the handle to the internal array, because it will become out-of-date the next time the list needs to grow its capacity. Immediate direct access is perfectly fine though.
I was reading XNA library code and inside the type VertexPositionColor, they supress the CA2105:ArrayFieldsShouldNotBeReadOnly message with the justification "The performance cost of cloning the array each time it is used is too great."
public struct VertexPositionColor
{
public static readonly VertexElement [ ] VertexElements;
}
But why would it be copied when it's used? This only happens for structs where the accessed property/field is a ValueType, right?
I guess they are justifying the fact that they are exposing an array field more than anything else and the underlying reason of why they are doing so is performance:
The alternative they probably had in mind was making the array field private with a property exposing an IEnumerable or returning a copy of the array each time the property was accesed.
EDIT. Edited the answer a little to make clearer what I was trying to say :p.
In most cases they'd be better off using Array.AsReadOnly and returning a generic ReadOnlyCollection. According to the documentation that's an O(1) operation.
In the current implementation callers can change the values in the array (modifying the static/global state directly).
One more reason to read Framework Design Guidelines - it gives you the reasons behind FxCop's recommendations.
Some C# collections have the count and some of them have the length property. Is there a thumbrule to find out which one has which and why the discrepency?
I'd say general Thumbrule would be the following:
Count is for collections with a
variable length, i.e. Lists (from
ICollection)
Length is for fixed length
collections, i.e. Arrays, or other
immutable objects, i.e. string.
UPDATE:
Just to elaborate Count comes through from ICollection and doesn't always indicate variability, for example (as per Greg Beech's comment) the ReadOnlyCollection<T> has the Count property but it is not variable, however it does implement ICollection.
Perhaps a more exact rule of thumb would be:
Count indicates that something
implements ICollection
Length indicates immutability.
If the type implements ICollection it will have the Count property. Length on the other hand is not standard and is defined as a property of the Array class so all fixed size arrays will have it as well.
As others have said Count comes from ICollection and Length is specifically defined on certain types, typically types that are immutable such as String and Array.
To me, Count implies mutability as the count of something can easily change. Length feels more immutable. The length of a given object usually doesn't change without drastic measure.
Also keep in mind there is the extension method Count() defined in LINQ, which provides a common interface to both of these properties. LINQ is smart enough to return Count() as efficiently as it can (ie if the Count or Length properties exist it will invoke them), so it's a decent alternative.
This question already has answers here:
Array versus List<T>: When to use which?
(16 answers)
Closed 9 years ago.
i basically want to know the differences or advantages in using a generic list instead of an array in the below mentioned scenario
class Employee
{
private string _empName;
public string EmpName
{
get{ return _empName; }
set{ _empName = value; }
}
}
1. Employee[] emp
2. List<Employee> emp
can anyone please tell me the advantages or disadvantages and which one to prefer?
One big difference is that List<Employee> can be expanded (you can call Add on it) or contracted (you can call Remove on it) whereas Employee[] is fixed in size. Thus, Employee[] is tougher to work with unless the need calls for it.
The biggest difference is that arrays can't be made longer or shorter once they're created. List instances, however can have elements added or removed. There are other diffs too (e.g. different sets of methods available) but add/remove is the big difference.
I like List unless there's a really good reason to use an Array, since the flexibility of List is nice and the perf penalty is very small relative to the cost of most other things your code is usually doing.
If you want to dive into a lot of interesting technical detail, check out this StackOverflow thread which delves into the List vs. Array question in more depth.
With the generic list, you can Add / Remove etc cheaply (at least, at the far end). Resizing an array (to add/remove) is more expensive. The obvious downside is that a list has spare capacity so maybe wastes a few bytes - not worth worrying about in most cases, though (and you can trim it).
Generally, prefer lists unless you know your data never changes size.
API-wise, since LINQ there is little to choose between them (i.e. the extra methods on List<T> are largely duplicated by LINQ, so arrays get them for free).
Another advantage is that with a list you don't need to expose a setter:
private readonly List<Foo> items = new List<Foo>();
public List<Foo> Items { get { return items; } }
eliminating a range of null bugs, and allowing you to keep control over the data (especially if you use a different IList<> implementation that supports inspection / validation when changing the contents).
If you are exposing a collection in a public interface the .NET Framework Guidelines advise to use a List rather than T[]. (In fact, a BindingList< T >)
Internally, an array can be more appropriate if you have a collection which is a fixed, known size. Resizing an array is expensive compared to adding an element to the end of a List.
You need to know the size of an array at the time that it is created, but you cannot change its size after it has been created.
So, it uses dynamic memory allocation for the array at creation time. (This differs from static memory allocation as used for C++ arrays, where the size must be known at compile time.)
A list can grow dynamically AFTER it has been created, and it has the .Add() function to do that.
-from MSDN
Generics Vs Array Lists-SO General comparision.
Generic List vs Arrays-SO Why is generic list slower than array?
Which one to prefer? List<T>.
If you know the number of elements array is a good choice. If not use the list. Internally List<T> uses an array of T so the are actually more like than you may think.
With a List, you don't need to know the size of the array beforehand. You can dynamically add new Employee's based on the needs of your implementation.