Properties should not return arrays - c#

Yes, I know this has been discussed many times before, and I read all the posts and comments regarding this question, but still can't seem to understand something.
One of the options that MSDN offers to solve this violation, is by returning a collection (or an interface which is implemented by a collection) when accessing the property, however clearly it does not solve the problem because most collections are not immutable and can also be changed.
Another possibility I've seen in the answers and comments to this question is to encapsulate the array with a ReadOnlyCollection and return it or a base interface of it(like IReadOnlyCollection), but I don't understand how this solves the performance issue.
If at any time the property is referenced it needs to allocate memory for a new ReadOnlyCollection that encapsulates the array, so what is the difference (in a manner of performance issues, not editing the array/collection) than simply returning a copy of the original array?
Moreover, ReadOnlyCollection has only one constructor with IList argument so there's a need to wrap the array with a list prior to creating it.
If I intentionally want to work with an array inside my class (not as immutable collection), is the performance better when I allocate new memory for a ReadOnlyCollection and encapsulate my array with it instead of returning a copy of the array?
Please clarify this.

If at any time the property is referenced it needs to allocate memory for a new ReadOnlyCollection that encapsulates the array, so what is the difference (in a manner of performance issues, not editing the array/collection) than simply returning a copy of the original array?
A ReadOnlyCollection<T> wraps a collection - it doesn't copy the collection.
Consider:
public class Foo
{
private readonly int[] array; // Initialized in constructor
public IReadOnlyList<int> Array => array.ToArray(); // Copy
public IReadOnlyList<int> Wrapper => new ReadOnlyCollection<int>(array); // Wrap
}
Imagine your array contains a million entries. Consider the amount of work that the Array property has to do - it's got to take a copy of all million entries. Consider the amount of work that the Wrapper property has to do - it's got to create an object which just contains a reference.
Additionally, if you don't mind a small extra memory hit, you can do it once instead:
public class Foo
{
private readonly int[] array; // Initialized in constructor
private readonly IReadOnlyList<int> Wrapper { get; }
public Foo(...)
{
array = ...;
Wrapper = new ReadOnlyCollection<int>(array);
}
}
Now accessing the Wrapper property doesn't involve any allocation at all - it doesn't matter if all callers see the same wrapper, because they can't mutate it.

You have no need to copy an array, just return it as IReadOnlyCollection<T>:
public class MyClass {
private int[] myArray = ...
public IReadOnlyCollection<int> MyArray {
get {
return myArray;
}
}
}

Related

Use case to understand why a list of strings should be declared as readonly

I am trying to understand what use cases would require me to declare a List<string> as a ReadOnly type.
An associated question with this is: How much memory upon instantiation of a list gets allocated?
The main reason to mark a field as readonly is so that you know that regular code cannot have swapped the list reference. One key scenario where that might matter is if you have other code in the type that is performing synchronization against the list using a lock(theListField). Obviously if someone swaps the list instance: things will break. Note that in most types that have a list/collection, it isn't expected to change the instance, so this readonly asserts that expectation. A common pattern is:
private List<Foo> _items = new List<Foo>();
public List<Foo> Items => _items;
or:
public List<Foo> Items {get;} = new List<Foo>();
In the first example, it should be perfectly fine to mark that field as readonly:
private readonly List<Foo> _items = new List<Foo>();
Marking a field as readonly has no impact on allocations etc. It also doesn't make the list read-only: just the field. You can still Add() / Remove() / Clear() etc. The only thing you can't do is change the list instance to be a completely different list instance; you can, of course, still completely change the contents. And read-only is a lie anyway: reflection and unsafe code can modify the value of a readonly field.
There is one scenario where readonly can have a negative impact, and that relates to large struct fields and calling methods on them. If the field is readonly, the compiler copies the struct onto the stack before calling the method - rather than executing the method in-place in the field; ldfld + stloc + ldloca (if the field is readonly) vs ldflda (if it isn't marked readonly); this is because the compiler can't trust the method not to mutate the value. It can't even check whether all the fields on the struct are readonly, because that isn't enough: a struct method can rewrite this:
struct EvilStruct
{
readonly int _id;
public EvilStruct(int id) { _id = id; }
public void EvilMethod() { this = new EvilStruct(_id + 1); }
}
Because the compiler is trying to enforce the readonly nature of a field, if you have:
readonly EvilStruct _foo;
//...
_foo.EvilMethod();
it wants to ensure that the EvilMethod() can't overwrite _foo with a new value. Hence the gymnastics and the copy on the stack. Usually this has negligible impact, but if the struct is atypically large, then this can cause a performance problem. The same issue of guaranteeing that the value doesn't change also applies to the new in argument modifier in C# 7.2:
void(in EvilStruct value) {...}
where the caller wants to guarantee that it doesn't change the value (this is actually a ref EvilStruct, so changes would be propagated).
This issue is resolved in C# 7.2 by the addition of the readonly struct syntax - this tells the compiler that it is safe to invoke the method in-situ without having to make the extra stack copy:
readonly struct EvilStruct
{
readonly int _id;
public EvilStruct(int id) { _id = id; }
// the following method no longer compiles:
// CS1604 Cannot assign to 'this' because it is read-only
public void EvilMethod() { this = new EvilStruct(_id + 1); }
}
This entire scenario doesn't apply to List<T>, because that is a reference type, not a value type.
Using readonly you can set the value of the field either in the
declaration, or in the constructor of the object that the field is a
member of.
According to List this means that only reference to List object will be immutable (not inner strings). U can use readonly for List just in order to be sure that field reference will not be overriden.
here is an example Why does Microsoft advise against readonly fields with mutable values?
Of all the classes you can work with in .NET, Strings have by far the oddest behavior. String is designed to operate like a value type, rather than a reference type, in many cases. And that is before we add string specific stuff like string-interning to the mix:
Comparing Values for Equality in .NET: Identity and Equivalence
That all said, I do not know why anyone would mark a list<string> readonly, any more than a list[int] or a list[object].
How much memory is allocated: That is unpredictable. The list will grow as you add items to it. And it will overallocate, to avoid excessive re-allocation. But the exact algorithm for that is a framework implementation detail/class version detail. If you know how many items you need ahead of time, either give the count in the constructor or just build the list from your static source collection (like an Array). This can be a minimal performance increase and is generally good practice. At the same time, string interning will try to limit how much memory is allocated to the actual string instances in memory, by reusing references.

How does memory management work for a static generic list in c#?

From the static, I understand that whenever a static variable is declared - It's memory get allocated in RAM. Suppose, we have integer static int i = 5; then a memory of 4 byte will be occupied somewhere in computer. And the same will happen if I have a static class or any reference type.
But my question is - if I declare a generic list like List<string> in c# and that is static. So what or how much memory will be allocated for this list in computer. And I assume that If I add items in this list - then it will require some more memory.
So, it breaks my concept about static - that a static field has
a fixed memory allocation at the time of declaration and that can not be changed through the application lifetime.
Can someone genius in c# help me out here?
There's no difference in the allocation of static member compared to non-static ones. "Static" just means that the member is visible and accessible to all instances of the class declaring it.
For the List<>: all objects you instantiate with a "new" keywork are created in a part of the memory called Heap. So are the static list you are asking about.
Lists in .NET are created as arrays of a certain length plus a pointer to an eventual new array. Then, whenever that first array gets filled by adding items to the list, a new array is created and linked to the first using the pointer. In this way the list can grow.
You're making a few assumptions about how .NET does memory management.
Under the hood (and I'd recommend looking) List uses Array to allocate blocks of data and is instantiated to a size of 4 unless specified, so you'll have a pointer for the array and it's size multiplied by the size of int. The amount of memory used initially depends on what size the array is when you instantiate the List.
E.g. if you have List<int> then you have a memory pointer for the List instance, a memory pointer for Array and at whatever size you set in the constructor multiplied by the amount of memory required for the data type of T. All of this gets put in the Gen0 cache initially and more or less memory is allocated, deallocated, moved to the Gen1 and Gen2 blocks as you populate, depopulate, use the List.
Given all of the above, there is no definitive answer unless the question is refined, e.g. "How much memory is allocated when I instantiate List<int>(5)?"
As for static, that's pretty much moot as the same amount of memory has to be allocated for the instance.
From a different angle, maybe the way to help would be to explain just what 'static' is in .net.
Here's a simple class:
public class MyClass
{
public string Zeus;
public static string Hades;
}
Okay, so what does that 'static' mean for our Hades string? Static basically means: it only exists in one place - it doesn't matter how many instances of the class you make, there's only going to be one Hades string.
MyClass first = new MyClass();
MyClass second = new MyClass();
MyClass third = new MyClass();
... there are now three Zeus strings. One for each of those MyClasses:
first.Zeus = "first";
second.Zeus = "second";
third.Zeus = "third";
... but there's only one Hades:
MyClass.Hades = "only version";
Notice how I didn't put 'first.Hades', or 'second.Hades'? That's because, since there's only one version, I don't have to put an instance to get to it. In fact, VisualStudio will flat out tell you, "I can't do this - you're trying to get to a static variable, but you're trying to get to it through an actual instance of your class."
Instead, you just use: MyClass.Hades.
So, getting back to your memory question?
public class MyClass
{
public List<string> Zeus;
public static List<string> Hades;
}
The way that those List are stored really isn't any different. The only difference is, you'll always have one List for your static Hades variable... and you'll have as a Zeus List for every MyClass you create (that hasn't been GarbageCollected)
Make sense? It's kinda important to get this concept down, because it'll come into play a lot for stuff like caching or having a Singleton global object.

Is there a List<T> like dynamic array that allows access to the internal array data in .NET?

Looking over the source of List<T>, it seems that there's no good way to access the private _items array of items.
What I need is basically a dynamic list of structs, which I can then modify in place. From my understanding, because C# 6 doesn't yet support ref return types, you can't have a List<T> return a reference to an element, which requires copying of the whole item, for example:
struct A {
public int X;
}
void Foo() {
var list = new List<A> { new A { X = 3; } };
list[0].X++; // this fails to compile, because the indexer returns a copy
// a proper way to do this would be
var copy = list[0];
copy.X++;
list[0] = copy;
var array = new A[] { new A { X = 3; } };
array[0].X++; // this works just fine
}
Looking at this, it's both clunky from syntax point of view, and possibly much slower than modifying the data in place (Unless the JIT can do some magic optimizations for this specific case? But I doubt they could be relied on in the general case, unless it's a special standardized optimization?)
Now if List<T>._items was protected, one could at least subclass List<T> and create a data structure with specific modify operations available. Is there another data structure in .NET that allows this, or do I have to implement my own dynamic array?
EDIT: I do not want any form of boxing or introducing any form of reference semantics. This code is intended for very high performance, and the reason I'm using an array of structs is to have them tighly packed on memory (and not everywhere around heap, resulting in cache misses).
I want to modify the structs in place because it's part of a performance critical algorithm that stores some of it's data in those structs.
Is there another data structure in .NET that allows this, or do I have to implement my own dynamic array?
Neither.
There isn't, and can't be, a data structure in .NET that avoids the structure copy, because deep integration with the C# language is needed to get around the "indexed getter makes a copy" issue. So you're right to think in terms of directly accessing the array.
But you don't have to build your own dynamic array from scratch. Many List<T>-like operations such as Resize and bulk movement of items are provided for you as static methods on type System.Array. They come in generic flavors, so no boxing is involved.
The unfortunate thing is that the high-performance Buffer.BlockCopy, which should work on any blittable type, actually contains a hard-coded check for primitive types and refuses to work on any structure.
So just go with T[] (plus int Count -- array length isn't good enough because trying to keep capacity equal to count is very inefficient) and use System.Array static methods when you would otherwise use methods of List<T>. If you wrap this as a PublicList<T> class, you can get reusability and both the convenience of methods for Add, Insert, Sort as well as direct element access by indexing directly on the array. Just exercise some restraint and never store the handle to the internal array, because it will become out-of-date the next time the list needs to grow its capacity. Immediate direct access is perfectly fine though.

Storing a C# reference to an array of structs and retrieving it - possible without copying?

UPDATE: the next version of C# has a feature under consideration that would directly answer this issue. c.f. answers below.
Requirements:
App data is stored in arrays-of-structs. There is one AoS for each type of data in the app (e.g. one for MyStruct1, another for MyStruct2, etc)
The structs are created at runtime; the more code we write in the app, the more there will be.
I need one class to hold references to ALL the AoS's, and allow me to set and get individual structs within those AoS's
The AoS's tend to be large (1,000's of structs per array); copying those AoS's around would be a total fail - they should never be copied! (they never need to!)
I have code that compiles and runs, and it works ... but is C# silently copying the AoS's under the hood every time I access them? (see below for full source)
public Dictionary<System.Type, System.Array> structArraysByType;
public void registerStruct<T>()
{
System.Type newType = typeof(T);
if( ! structArraysByType.ContainsKey(newType ) )
{
structArraysByType.Add(newType, new T[1000] ); // allowing up to 1k
}
}
public T get<T>( int index )
{
return ((T[])structArraysByType[typeof(T)])[index];
}
public void set<T>( int index, T newValue )
{
((T[])structArraysByType[typeof(T)])[index] = newValue;
}
Notes:
I need to ensure C# sees this as an array of value-types, instead of an array of objects ("don't you DARE go making an array of boxed objects around my structs!"). As I understand it: Generic T[] ensures that (as expected)
I couldn't figure out how to express the type "this will be an array of structs, but I can't tell you which structs at compile time" other than System.Array. System.Array works -- but maybe there are alternatives?
In order to index the resulting array, I have to typecast back to T[]. I am scared that this typecast MIGHT be boxing the Array-of-Structs; I know that if it were (T) instead of (T[]), it would definitely box; hopefully it doesn't do that with T[] ?
Alternatively, I can use the System.Array methods, which definitely boxes the incoming and outgoing struct. This is a fairly major problem (although I could workaround it if were the only way to make C# work with Array-of-struct)
As far as I can see, what you are doing should work fine, but yes it will return a copy of a struct T instance when you call Get, and perform a replacement using a stack based instance when you call Set. Unless your structs are huge, this should not be a problem.
If they are huge and you want to
Read (some) properties of one of a struct instance in your array without creating a copy of it.
Update some of it's fields (and your structs are not supposed to be immutable, which is generally a bad idea, but there are good reasons for doing it)
then you can add the following to your class:
public delegate void Accessor<T>(ref T item) where T : struct;
public delegate TResult Projector<T, TResult>(ref T item) where T : struct;
public void Access<T>(int index, Accessor<T> accessor)
{
var array = (T[])structArraysByType[typeof(T)];
accessor(ref array[index]);
}
public TResult Project<T, TResult>(int index, Projector<T, TResult> projector)
{
var array = (T[])structArraysByType[typeof(T)];
return projector(ref array[index]);
}
Or simply return a reference to the underlying array itself, if you don't need to abstract it / hide the fact that your class encapsulates them:
public T[] GetArray<T>()
{
return (T[])structArraysByType[typeof(T)];
}
From which you can then simply access the elements:
var myThingsArray = MyStructArraysType.GetArray<MyThing>();
var someFieldValue = myThingsArray[10].SomeField;
myThingsArray[3].AnotherField = "Hello";
Alternatively, if there is no specific reason for them to be structs (i.e. to ensure sequential cache friendly fast access), you might want to simply use classes.
There is a much better solution that is planned for adding to next version of C#, but does not yet exist in C# - the "return ref" feature of .NET already exists, but isn't supported by the C# compiler.
Here's the Issue for tracking that feature: https://github.com/dotnet/roslyn/issues/118
With that, the entire problem becomes trivial "return ref the result".
(answer added for future, when the existing answer will become outdated (I hope), and because there's still time to comment on that proposal / add to it / improve it!)

Jon Skeet's Edulinq - Empty Array Caching

I was going through Edulinq by Jon Skeet, and I came across the following code, Page 23, in which he implements cache mechanism for Empty() operator of Linq
private static class EmptyHolder<T>
{
internal static readonly T[] Array = new T[0];
}
My question is, how does this actually cache the Array variable?
Optionally, How does it work in CLR?
Edit: Also following that, he mentions there was a revolt against returning an array. Why should anybody not return an array (even if it is 0 sized?)?
My question is, how does this actually cache the Array variable?
The CLR caches it per type argument. Basically, EmptyHolder<int> is a different type to EmptyHolder<string> etc, and the type initializer is invoked (automatically, by the CLR) once per concrete type.
So:
var x = EmptyHolder<string>.Array; // Needs to construct the empty string[] array
var y = EmptyHolder<string>.Array; // No extra work! x and y have the same value
var z = EmptyHolder<int>.Array; // This constructs an empty array for int[]
Optionally, How does it work in CLR?
That's an implementation detail that I don't know much about, I'm afraid. But basically this is all about how the CLR does things :)
Edit: Also following that, he mentions there was a revolt against returning an array. Why should anybody not return an array (even if it is 0 sized?)?
Well, there was a comment of:
The array method is not so great: People will incorrectly depend on the return value being an array although this is not documented.
Personally I don't think it's an issue, but it was fun to write the alternative implementation :)
Each time you invoke EmptyHolder.Empty() for the firsttime for T, you will have to invoke the static constructor for EmptyHolder.
Now it looks like there is no static constructor, right? Wrong. The class can be rewritten as...
private static class EmptyHolder<T>
{
static EmptyHolder<T>()
{
Array = new T[0];
}
internal static readonly T[] Array;
public IEnum<T> Empty();
}
Now, subsequent runs of Empty will not invoke the static constructor (unless a different T is used).
Be it as I may to critise Jon Skeet, this is a tiny optimization to be worry about.

Categories

Resources