Jon Skeet's Edulinq - Empty Array Caching

Jon Skeet's Edulinq - Empty Array Caching - c#

I was going through Edulinq by Jon Skeet, and I came across the following code, Page 23, in which he implements cache mechanism for Empty() operator of Linq
private static class EmptyHolder<T>
{
internal static readonly T[] Array = new T[0];
}
My question is, how does this actually cache the Array variable?
Optionally, How does it work in CLR?
Edit: Also following that, he mentions there was a revolt against returning an array. Why should anybody not return an array (even if it is 0 sized?)?

My question is, how does this actually cache the Array variable?
The CLR caches it per type argument. Basically, EmptyHolder<int> is a different type to EmptyHolder<string> etc, and the type initializer is invoked (automatically, by the CLR) once per concrete type.
So:
var x = EmptyHolder<string>.Array; // Needs to construct the empty string[] array
var y = EmptyHolder<string>.Array; // No extra work! x and y have the same value
var z = EmptyHolder<int>.Array; // This constructs an empty array for int[]
Optionally, How does it work in CLR?
That's an implementation detail that I don't know much about, I'm afraid. But basically this is all about how the CLR does things :)
Edit: Also following that, he mentions there was a revolt against returning an array. Why should anybody not return an array (even if it is 0 sized?)?
Well, there was a comment of:
The array method is not so great: People will incorrectly depend on the return value being an array although this is not documented.
Personally I don't think it's an issue, but it was fun to write the alternative implementation :)

Each time you invoke EmptyHolder.Empty() for the firsttime for T, you will have to invoke the static constructor for EmptyHolder.
Now it looks like there is no static constructor, right? Wrong. The class can be rewritten as...
private static class EmptyHolder<T>
{
static EmptyHolder<T>()
{
Array = new T[0];
}
internal static readonly T[] Array;
public IEnum<T> Empty();
}
Now, subsequent runs of Empty will not invoke the static constructor (unless a different T is used).
Be it as I may to critise Jon Skeet, this is a tiny optimization to be worry about.

Related

Properties should not return arrays

Yes, I know this has been discussed many times before, and I read all the posts and comments regarding this question, but still can't seem to understand something.
One of the options that MSDN offers to solve this violation, is by returning a collection (or an interface which is implemented by a collection) when accessing the property, however clearly it does not solve the problem because most collections are not immutable and can also be changed.
Another possibility I've seen in the answers and comments to this question is to encapsulate the array with a ReadOnlyCollection and return it or a base interface of it(like IReadOnlyCollection), but I don't understand how this solves the performance issue.
If at any time the property is referenced it needs to allocate memory for a new ReadOnlyCollection that encapsulates the array, so what is the difference (in a manner of performance issues, not editing the array/collection) than simply returning a copy of the original array?
Moreover, ReadOnlyCollection has only one constructor with IList argument so there's a need to wrap the array with a list prior to creating it.
If I intentionally want to work with an array inside my class (not as immutable collection), is the performance better when I allocate new memory for a ReadOnlyCollection and encapsulate my array with it instead of returning a copy of the array?
Please clarify this.

If at any time the property is referenced it needs to allocate memory for a new ReadOnlyCollection that encapsulates the array, so what is the difference (in a manner of performance issues, not editing the array/collection) than simply returning a copy of the original array?
A ReadOnlyCollection<T> wraps a collection - it doesn't copy the collection.
Consider:
public class Foo
{
private readonly int[] array; // Initialized in constructor
public IReadOnlyList<int> Array => array.ToArray(); // Copy
public IReadOnlyList<int> Wrapper => new ReadOnlyCollection<int>(array); // Wrap
}
Imagine your array contains a million entries. Consider the amount of work that the Array property has to do - it's got to take a copy of all million entries. Consider the amount of work that the Wrapper property has to do - it's got to create an object which just contains a reference.
Additionally, if you don't mind a small extra memory hit, you can do it once instead:
public class Foo
{
private readonly int[] array; // Initialized in constructor
private readonly IReadOnlyList<int> Wrapper { get; }
public Foo(...)
{
array = ...;
Wrapper = new ReadOnlyCollection<int>(array);
}
}
Now accessing the Wrapper property doesn't involve any allocation at all - it doesn't matter if all callers see the same wrapper, because they can't mutate it.

You have no need to copy an array, just return it as IReadOnlyCollection<T>:
public class MyClass {
private int[] myArray = ...
public IReadOnlyCollection<int> MyArray {
get {
return myArray;
}
}
}

LINQ Concatenation with a single extra element

If we want a single IEnumerable<T> representing the concatenation of of two IEnumberable<T>s we can use the LINQ Concat() method.
For Example:
int[] a = new int[] { 1, 2, 3 };
int[] b = new int[] { 4, 5 };
foreach (int i in a.Concat(b))
{
Console.Write(i);
}
of course outputs 12345.
My question is, why is there no overload of Concat() just accepting a single element of type T such that:
int[] a = new int[] { 1, 2, 3 };
foreach (int i in a.Concat(4))
{
Console.Write(i);
}
would compile and produce the output: 1234?
Googling around the issue throws up a couple of SO questions where the accepted answer suggests that the best approach when looking to acheive this is to simply do a.Concat(new int[] {4}). Which is fine(ish) but a little 'unclean' in my opinion because:
Maybe there is a performance hit from declaring a new array (albeit this is presumably going to be negligible pretty much evey time)
It just doesn't look as neat, easy to read and natural as a.Concat(4)
Anyone know why such an overload doesn't exist?
Also, assuming my Googling hasn't let me down - there is no such similar LINQ extension method taking a single element of type T.
(I understand it is trivially easy to roll one's own extension method to produce this effect - but doesnt that just make the ommision even more odd? I suspect there will be a reason for it's ommision but can't imagine what it could be?)
UPDATE:
Acknowledging the couple of votes to close this as opinion based - I should clarify that I am NOT seeking peoples opinions on whether this would be a good addition to LINQ.
More I am seeking to understand the FACTUAL reasons why it is not ALREADY part of LINQ.

In .NET Framework 4.7.1 they added Prepend and Append methods to add one element to the beginning and to the end of enumerable correspondingly.
usage:
var emptySequence = Enumerable.Empty<long>();
var singleElementSequence = emptySequence.Append(256L);

A good reason for inclusion (in one form or another) would be for IEnumerables to be more like functional sequence monads.
But since LINQ did not arrive until .NET 3.0, and is implemented mostly using extension methods, I can imagine that they omitted extension methods working on a single element of T. Still this is pure speculation on my part.
They did however include generator functions, that are not extension methods. Specifically the following:
Enumerable.Empty
Enumerable.Repeat
Enumerable.Range
You could use these instead of homebrew extension methods. The two use cases you mentioned, can be solved as:
int[] a = new int[] { 1, 2, 3 };
var myPrependedEnumerable = Enumerable.Repeat(0, 1).Concat(a);
var myAppendedEnumerable = a.Concat(Enumerable.Repeat(4, 1));
It might have been nice if an additional overload was included as syntactical sugar.
Enumerable.FromElement(x); // or a better name (see below).
The absence of an explicit Unit function is curious and interesting
In the interesting MoreLINQ series of blog posts by Bart De Smet, illustrated using the System.Linq.EnumerableEx, the post More LINQ with System.Interactive – Sequences under construction specifically deals with this question, using the following appropriately named method for constructing a single element IEnumerable.
public static IEnumerable<TSource> Return<TSource>(TSource value);
This is nothing but the return function (sometimes referred to as unit) used on a monad.
Also interesting is the blog series by Eric Lippert on monads, which features the following quote in part eight:
IEnumerable<int> sequence = Enumerable.Repeat<int>(123, 1);
And frankly, that last one is a bit dodgy. I wish there was a static method on Enumerable specifically for making a one-element sequence.
Furthermore, the F# language provides the seq type:
Sequences are represented by the seq<'T> type, which is an alias for IEnumerable. Therefore, any .NET Framework type that implements System.IEnumerable can be used as a sequence.
It provides an explicit unit function as Seq.singleton.
Concluding
While none of this provides us with facts that shed light on the reasons why these sequence constructs are not explicitly present in c#, until someone with knowledge of the design decision process shares that information, it does highlight it would be worth knowing more about.

First - your Googling is fine - there is no such method. I like the idea though. It's a use case that if you run in to, having it would be great.
I suspect it wasn't included with the LINQ API because the designers didn't see a common enough need for it. That's just my conjecture though.
You're right to say that creating an array with just one element isn't all that intuitive. You can get the feel and performance you're going for with this:
public static class EnumerableExtensions {
public static IEnumerable<T> Concat<T>(this IEnumerable<T> source, T element) {
foreach (var e in source) {
yield return e;
}
yield return element;
}
public static IEnumerable<T> Concat<T>(this T source, IEnumerable<T> element) {
yield return source;
foreach (var e in element) {
yield return e;
}
}
}
class Program
{
static void Main()
{
List<int> ints = new List<int> {1, 2, 3};
var startingInt = 0;
foreach (var i in startingInt.Concat(ints).Concat(4)) {
Console.WriteLine(i);
}
}
}
Output:
0
1
2
3
4
Lazy evaluation
Implemented similarly to the built-in LINQ methods (they actually return an internal iterator, instead of directly yielding)
Argument checking wouldn't hurt it

Philosophical questions, I like that.
At first, you can create easily that behaviour with an extension method
public static IEnumerable<TSource> Concat(this IEnumerable<TSource> source, TSource element)
{
return source.Concat(new[]{element});
}
I think the central question is that IEnumerable is an immutable interface and it is not meant to be modified on the fly.
This could (I use could because I do not work in Microsoft so I may be completely wrong) be the reason while the modify part of IEnumerable is not so well developed (in the meaning that you're missing some handy methods).
If you have to modify that collection, consider to use a List or another interface.

Storing a C# reference to an array of structs and retrieving it - possible without copying?

UPDATE: the next version of C# has a feature under consideration that would directly answer this issue. c.f. answers below.
Requirements:
App data is stored in arrays-of-structs. There is one AoS for each type of data in the app (e.g. one for MyStruct1, another for MyStruct2, etc)
The structs are created at runtime; the more code we write in the app, the more there will be.
I need one class to hold references to ALL the AoS's, and allow me to set and get individual structs within those AoS's
The AoS's tend to be large (1,000's of structs per array); copying those AoS's around would be a total fail - they should never be copied! (they never need to!)
I have code that compiles and runs, and it works ... but is C# silently copying the AoS's under the hood every time I access them? (see below for full source)
public Dictionary<System.Type, System.Array> structArraysByType;
public void registerStruct<T>()
{
System.Type newType = typeof(T);
if( ! structArraysByType.ContainsKey(newType ) )
{
structArraysByType.Add(newType, new T[1000] ); // allowing up to 1k
}
}
public T get<T>( int index )
{
return ((T[])structArraysByType[typeof(T)])[index];
}
public void set<T>( int index, T newValue )
{
((T[])structArraysByType[typeof(T)])[index] = newValue;
}
Notes:
I need to ensure C# sees this as an array of value-types, instead of an array of objects ("don't you DARE go making an array of boxed objects around my structs!"). As I understand it: Generic T[] ensures that (as expected)
I couldn't figure out how to express the type "this will be an array of structs, but I can't tell you which structs at compile time" other than System.Array. System.Array works -- but maybe there are alternatives?
In order to index the resulting array, I have to typecast back to T[]. I am scared that this typecast MIGHT be boxing the Array-of-Structs; I know that if it were (T) instead of (T[]), it would definitely box; hopefully it doesn't do that with T[] ?
Alternatively, I can use the System.Array methods, which definitely boxes the incoming and outgoing struct. This is a fairly major problem (although I could workaround it if were the only way to make C# work with Array-of-struct)

As far as I can see, what you are doing should work fine, but yes it will return a copy of a struct T instance when you call Get, and perform a replacement using a stack based instance when you call Set. Unless your structs are huge, this should not be a problem.
If they are huge and you want to
Read (some) properties of one of a struct instance in your array without creating a copy of it.
Update some of it's fields (and your structs are not supposed to be immutable, which is generally a bad idea, but there are good reasons for doing it)
then you can add the following to your class:
public delegate void Accessor<T>(ref T item) where T : struct;
public delegate TResult Projector<T, TResult>(ref T item) where T : struct;
public void Access<T>(int index, Accessor<T> accessor)
{
var array = (T[])structArraysByType[typeof(T)];
accessor(ref array[index]);
}
public TResult Project<T, TResult>(int index, Projector<T, TResult> projector)
{
var array = (T[])structArraysByType[typeof(T)];
return projector(ref array[index]);
}
Or simply return a reference to the underlying array itself, if you don't need to abstract it / hide the fact that your class encapsulates them:
public T[] GetArray<T>()
{
return (T[])structArraysByType[typeof(T)];
}
From which you can then simply access the elements:
var myThingsArray = MyStructArraysType.GetArray<MyThing>();
var someFieldValue = myThingsArray[10].SomeField;
myThingsArray[3].AnotherField = "Hello";
Alternatively, if there is no specific reason for them to be structs (i.e. to ensure sequential cache friendly fast access), you might want to simply use classes.

There is a much better solution that is planned for adding to next version of C#, but does not yet exist in C# - the "return ref" feature of .NET already exists, but isn't supported by the C# compiler.
Here's the Issue for tracking that feature: https://github.com/dotnet/roslyn/issues/118
With that, the entire problem becomes trivial "return ref the result".
(answer added for future, when the existing answer will become outdated (I hope), and because there's still time to comment on that proposal / add to it / improve it!)

Impure method is called for readonly field

I'm using Visual Studio 2010 + ReSharper and it shows a warning on the following code:
if (rect.Contains(point))
{
...
}
rect is a readonly Rectangle field, and ReSharper shows me this warning:
"Impure Method is called for readonly field of value type."
What are impure methods and why is this warning being shown to me?

First off, Jon, Michael and Jared's answers are essentially correct but I have a few more things I'd like to add to them.
What is meant by an "impure" method?
It is easier to characterize pure methods. A "pure" method has the following characteristics:
Its output is entirely determined by its input; its output does not depend on externalities like the time of day or the bits on your hard disk. Its output does not depend on its history; calling the method with a given argument twice should give the same result.
A pure method produces no observable mutations in the world around it. A pure method may choose to mutate private state for efficiency's sake, but a pure method does not, say, mutate a field of its argument.
For example, Math.Cos is a pure method. Its output depends only on its input, and the input is not changed by the call.
An impure method is a method which is not pure.
What are some of the dangers of passing readonly structs to impure methods?
There are two that come to mind. The first is the one pointed out by Jon, Michael and Jared, and this is the one that ReSharper is warning you about. When you call a method on a struct, we always pass a reference to the variable that is the receiver, in case the method wishes to mutate the variable.
So what if you call such a method on a value, rather than a variable? In that case, we make a temporary variable, copy the value into it, and pass a reference to the variable.
A readonly variable is considered a value, because it cannot be mutated outside the constructor. So we are copying the variable to another variable, and the impure method is possibly mutating the copy, when you intend it to mutate the variable.
That's the danger of passing a readonly struct as a receiver. There is also a danger of passing a struct that contains a readonly field. A struct that contains a readonly field is a common practice, but it is essentially writing a cheque that the type system does not have the funds to cash; the "read-only-ness" of a particular variable is determined by the owner of the storage. An instance of a reference type "owns" its own storage, but an instance of a value type does not!
struct S
{
private readonly int x;
public S(int x) { this.x = x; }
public void Badness(ref S s)
{
Console.WriteLine(this.x);
s = new S(this.x + 1);
// This should be the same, right?
Console.WriteLine(this.x);
}
}
One thinks that this.x is not going to change because x is a readonly field and Badness is not a constructor. But...
S s = new S(1);
s.Badness(ref s);
... clearly demonstrates the falsity of that. this and s refer to the same variable, and that variable is not readonly!

An impure method is one which isn't guaranteed to leave the value as it was.
In .NET 4, you can decorate methods and types with [Pure] to declare them to be pure, and R# will take notice of this. Unfortunately, you can't apply it to someone else's members, and you can't convince R# that a type/member is pure in a .NET 3.5 project as far as I'm aware. (This bites me in Noda Time all the time.)
The idea is that if you're calling a method which mutates a variable, but you call it on a read-only field, it's probably not doing what you want, so R# will warn you about this. For example:
public struct Nasty
{
public int value;
public void SetValue()
{
value = 10;
}
}
class Test
{
static readonly Nasty first;
static Nasty second;
static void Main()
{
first.SetValue();
second.SetValue();
Console.WriteLine(first.value); // 0
Console.WriteLine(second.value); // 10
}
}
This would be a really useful warning if every method which was actually pure was declared that way. Unfortunately they're not, so there are a lot of false positives :(

The short answer is that this is a false positive, and you can safely ignore the warning.
The longer answer is that accessing a read-only value type creates a copy of it, so that any changes to the value made by a method would only affect the copy. ReSharper doesn't realize that Contains is a pure method (meaning it has no side effects). Eric Lippert talks about it here: Mutating Readonly Structs

It sounds like ReSharper believes that the method Contains can mutate the rect value. Because rect is a readonly struct, the C# compiler makes defensive copies of the value to prevent the method from mutating a readonly field. Essentially, the final code looks like this:
Rectangle temp = rect;
if (temp.Contains(point)) {
...
}
ReSharper is warning you here that Contains may mutate rect in a way that would be immediately lost because it happened on a temporary.

An Impure method is a method that could have side-effects. In this case, ReSharper seems to think it could change rect. It probably doesn't but the chain of evidence is broken.

Are methods that modify reference type parameters bad?

I've seen methods like this:
public void Foo(List<string> list)
{
list.Add("Bar");
}
Is this good practice to modify parameters in a method?
Wouldn't this be better?
public List<string> Foo(List<string> list)
{
// Edit
List<string> newlist = new List<string>(list);
newlist.Add("Bar");
return newlist;
}
It just feels like the first example has unexpected side effects.

In the example you've given, the first seems a lot nicer to me than the second. If I saw a method that accepted a list and also returned a list, my first assumption would be that it was returning a new list and not touching the one it was given. The second method, therefore, is the one with unexpected side effects.
As long as your methods are named appropriately there's little danger in modifying the parameter. Consider this:
public void Fill<T>(IList<T> list)
{
// add a bunch of items to list
}
With a name like "Fill" you can be pretty certain that the method will modify the list.

Frankly, in this case, both methods do more or less the same thing. Both will modify the List that was passed in.
If the objective is to have lists immutable by such a method, the second example should make a copy of the List that was sent in, and then perform the Add operation on the new List and then return that.
I'm not familiar with C# nor .NET, so my guess would be something along the line of:
public List<string> Foo(List<string> list)
{
List<string> newList = (List<string>)list.Clone();
newList.Add("Bar");
return newList;
}
This way, the method which calls the Foo method will get the newly created List returned, and the original List that was passed in would not be touched.
This really is up to the "contract" of your specifications or API, so in cases where Lists can just be modified, I don't see a problem with going with the first approach.

You're doing the exact same thing in both methods, just one of them is returning the same list.
It really depends on what you're doing, in my opinion. Just make sure your documentation is clear on what is going on. Write pre-conditions and post-conditions if you're into that sort of thing.

It's actually not that unexpected that a method that takes a list as parameter modifies the list. If you want a method that only reads from the list, you would use an interface that only allows reading:
public int GetLongest(IEnumerable<string> list) {
int len = 0;
foreach (string s in list) {
len = Math.Max(len, s.Length);
}
return len;
}
By using an interface like this you don't only prohibit the method from changing the list, it also gets more flexible as it can use any collection that implements the interface, like a string array for example.
Some other languages has a const keyword that can be applied to parameters to prohibit a method from changing them. As .NET has interfaces that you can use for this and strings that are immutable, there isn't really a need for const parameters.

The advent of extension methods has made it a bit easier to deal with methods that introduce side effects. For example, in your example it becomes much more intuitive to say
public static class Extensions
{
public static void AddBar(this List<string> list)
{
list.Add("Bar");
}
}
and call it with
mylist.AddBar();
which makes it clearer that something is happening to the list.
As mentioned in the comments, this is most useful on lists since modifications to a list can tend to be more confusing. On a simple object, I would tend to just to modify the object in place.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Jon Skeet's Edulinq - Empty Array Caching - c#

Related

Properties should not return arrays

LINQ Concatenation with a single extra element

Storing a C# reference to an array of structs and retrieving it - possible without copying?

Impure method is called for readonly field

Are methods that modify reference type parameters bad?

Categories

Resources