Is there an "Empty List" singleton in C#? - c#

In C# I use LINQ and IEnumerable a good bit. And all is well-and-good (or at least mostly so).
However, in many cases I find myself that I need an empty IEnumerable<X> as a default. That is, I would like
for (var x in xs) { ... }
to work without needing a null-check. Now this is what I currently do, depending upon the larger context:
var xs = f() ?? new X[0]; // when xs is assigned, sometimes
for (var x in xs ?? new X[0]) { ... } // inline, sometimes
Now, while the above is perfectly fine for me -- that is, if there is any "extra overhead" with creating the array object I just don't care -- I was wondering:
Is there "empty immutable IEnumerable/IList" singleton in C#/.NET? (And, even if not, is there a "better" way to handle the case described above?)
Java has Collections.EMPTY_LIST immutable singleton -- "well-typed" via Collections.emptyList<T>() -- which serves this purpose, although I am not sure if a similar concept could even work in C# because generics are handled differently.
Thanks.

You are looking for Enumerable.Empty<T>().
In other news the Java empty list sucks because the List interface exposes methods for adding elements to the list which throw exceptions.

Enumerable.Empty<T>() is exactly that.

In your original example you use an empty array to provide an empty enumerable. While using Enumerable.Empty<T>() is perfectly right, there might other cases: if you have to use an array (or the IList<T> interface), you can use the method
System.Array.Empty<T>()
which helps you to avoid unnecessary allocations.
Notes / References:
the documentation does not mention that this method allocates the empty array only once for each type
roslyn analyzers recommend this method with the warning CA1825: Avoid zero-length array allocations
Microsoft reference implementation
.NET Core implementation

I think you're looking for Enumerable.Empty<T>().
Empty list singleton doesn't make that much sense, because lists are often mutable.

I think adding an extension method is a clean alternative thanks to their ability to handle nulls - something like:
public static IEnumerable<T> EmptyIfNull<T>(this IEnumerable<T> list)
{
return list ?? Enumerable.Empty<T>();
}
foreach(var x in xs.EmptyIfNull())
{
...
}

Using Enumerable.Empty<T>() with lists has a drawback. If you hand Enumerable.Empty<T> into the list constructor then an array of size 4 is allocated. But if you hand an empty Collection into the list constructor then no allocation occurs. So if you use this solution throughout your code then most likely one of the IEnumerables will be used to construct a list, resulting in unnecessary allocations.

Microsoft implemented `Any()' like this (source)
public static bool Any<TSource>(this IEnumerable<TSource> source)
{
if (source == null) throw new ArgumentNullException("source");
using (IEnumerator<TSource> e = source.GetEnumerator())
{
if (e.MoveNext()) return true;
}
return false;
}
If you want to save a call on the call stack, instead of writing an extension method that calls !Any(), just rewrite make these three changes:
public static bool IsEmpty<TSource>(this IEnumerable<TSource> source) //first change (name)
{
if (source == null) throw new ArgumentNullException("source");
using (IEnumerator<TSource> e = source.GetEnumerator())
{
if (e.MoveNext()) return false; //second change
}
return true; //third change
}

Related

Given Array.Cast<T>(), how do I determine T via reflection?

TL;DR - I would expect these all to work the same, but (per comments) they do not:
var c1 = new[] { FileMode.Append }.Cast<int>();
var c2 = new[] { FileMode.Append }.Select(x => (int)x);
var c3 = new[] { FileMode.Append }.Select(x => x).Cast<int>();
foreach (var x in c1 as IEnumerable)
Console.WriteLine(x); // Append (I would expect 6 here!)
foreach (var x in c2 as IEnumerable)
Console.WriteLine(x); // 6
foreach (var x in c3 as IEnumerable)
Console.WriteLine(x); // 6
This is a contrived example; I obviously wouldn't cast the collections to IEnumerable if I didn't have to, and in that case everything would work as expected. But I'm working on a library with several methods that take an object and return a serialized string representation. If it determines via reflection that the object implements IEnumerable, it will enumerate it and, in almost all cases, return the expected result...except for this strange case with Array.Cast<T>.
There's 2 things I could do here:
Tell uses to materialize IEnumerables first, such as with ToList().
Create an overload for each affected method that takes an IEnumerable<T>.
For different reasons, neither of those is ideal. Is it possible for a method that takes an object to somehow infer T when Array.Cast<T>() is passed?
Is it possible for a method that takes an object to somehow infer T when Array.Cast() is passed?
No, not in the example you gave.
The reason you get the output you do is that the Enumerable.Cast<T>() method has an optimization to allow the original object to be returned when it's compatible with the type you ask for:
public static IEnumerable<TResult> Cast<TResult>(this IEnumerable source) {
IEnumerable<TResult> typedSource = source as IEnumerable<TResult>;
if (typedSource != null) return typedSource;
if (source == null) throw Error.ArgumentNull("source");
return CastIterator<TResult>(source);
}
So in your first case, nothing actually happens. The Cast<T>() method is just returning the object you passed into the method, and so by the time you get it back, the fact that it ever went through Cast<T>() is completely lost.
Your question doesn't have any other details about how you got into this situation or why it matters in a practical sense. But we can say conclusively that given the code you posted, it would be impossible to achieve the goal you've stated.

Cast from IEnumerable to IEnumerable<object>

I prefer to use IEnumerable<object>, for LINQ extension methods are defined on it, not IEnumerable, so that I can use, for example, range.Skip(2). However, I also prefer to use IEnumerable, for T[] is implicitly convertible to IEnumerable whether T is a reference type or value type. For the latter case, no boxing is involved, which is good. As a result, I can do IEnumerable range = new[] { 1, 2, 3 }. It seems impossible to combine the best of both worlds. Anyway, I chose to settle down to IEnumerable and do some kind of cast when I need to apply LINQ methods.
From this SO thread, I come to know that range.Cast<object>() is able to do the job. But it incurs performance overhead which is unnecessary in my opinion. I tried to perform a direct compile-time cast like (IEnumerable<object>)range. According to my tests, it works for reference element type but not for value type. Any ideas?
FYI, the question stems from this GitHub issue. And the test code I used is as follows:
static void Main(string[] args)
{
// IEnumerable range = new[] { 1, 2, 3 }; // won't work
IEnumerable range = new[] { "a", "b", "c" };
var range2 = (IEnumerable<object>)range;
foreach (var item in range2)
{
Console.WriteLine(item);
}
}
According to my tests, it works for reference element type but not for
value type.
Correct. This is because IEnumerable<out T> is co-variant, and co-variance/contra-variance is not supported for value types.
I come to know that range.Cast() is able to do the job. But it
incurs performance overhead which is unnecessary in my opinion.
IMO the performance cost(brought by boxing) is unavoidable if you want a collection of objects with a collection of value-types given. Using the non-generic IEnumerable won't avoid boxing because IEnumerable.GetEnumerator provides a IEnumerator whose .Current property returns an object. I'd prefer always use IEnumerable<T> instead of IEnumerable. So just use the .Cast method and forget the boxing.
After decompiling that extension, the source showed this:
public static IEnumerable<TResult> Cast<TResult>(this IEnumerable source)
{
IEnumerable<TResult> enumerable = source as IEnumerable<TResult>;
if (enumerable != null)
return enumerable;
if (source == null)
throw Error.ArgumentNull("source");
return Enumerable.CastIterator<TResult>(source);
}
private static IEnumerable<TResult> CastIterator<TResult>(IEnumerable source)
{
foreach (TResult result in source)
yield return result;
}
This basically does nothing else than IEnumerable<object> in first place.
You stated:
According to my tests, it works for reference element type but not for
value type.
How did you test that?
Despite I really do not like this approach, I know it is possible to provide a toolset similar to LINQ-to-Objects that is callable directly on an IEnumerable interface, without forcing a cast to IEnumerable<object> (bad: possible boxing!) and without casting to IEnumerable<TFoo> (even worse: we'd need to know and write TFoo!).
However, it is:
not free for runtime: it may be heavy, I didn't run perfomance test
not free for developer: you actually need to write all those LINQ-like extension methods for IEnumerable (or find a lib that does it)
not simple: you need to inspect the incoming type carefully and need to be careful with many possible options
is not an oracle: given a collection that implements IEnumerable but does not implement IEnumerable<T> it only can throw error or silently cast it to IEnumerable<object>
will not always work: given a collection that implements both IEnumerable<int> and IEnumerable<string> it simply cannot know what to do; even giving up and casting to IEnumerable<object> doesn't sound right here
Here's an example for .Net4+:
using System;
using System.Linq;
using System.Collections.Generic;
class Program
{
public static void Main()
{
Console.WriteLine("List<int>");
new List<int> { 1, 2, 3 }
.DoSomething()
.DoSomething();
Console.WriteLine("List<string>");
new List<string> { "a", "b", "c" }
.DoSomething()
.DoSomething();
Console.WriteLine("int[]");
new int[] { 1, 2, 3 }
.DoSomething()
.DoSomething();
Console.WriteLine("string[]");
new string[] { "a", "b", "c" }
.DoSomething()
.DoSomething();
Console.WriteLine("nongeneric collection with ints");
var stack = new System.Collections.Stack();
stack.Push(1);
stack.Push(2);
stack.Push(3);
stack
.DoSomething()
.DoSomething();
Console.WriteLine("nongeneric collection with mixed items");
new System.Collections.ArrayList { 1, "a", null }
.DoSomething()
.DoSomething();
Console.WriteLine("nongeneric collection with .. bits");
new System.Collections.BitArray(0x6D)
.DoSomething()
.DoSomething();
}
}
public static class MyGenericUtils
{
public static System.Collections.IEnumerable DoSomething(this System.Collections.IEnumerable items)
{
// check the REAL type of incoming collection
// if it implements IEnumerable<T>, we're lucky!
// we can unwrap it
// ...usually. How to unwrap it if it implements it multiple times?!
var ietype = items.GetType().FindInterfaces((t, args) =>
t.IsGenericType && t.GetGenericTypeDefinition() == typeof(IEnumerable<>),
null).SingleOrDefault();
if (ietype != null)
{
return
doSomething_X(
doSomething_X((dynamic)items)
);
// .doSomething_X() - and since the compile-time type is 'dynamic' I cannot chain
// .doSomething_X() - it in normal way (despite the fact it would actually compile well)
// `dynamic` doesn't resolve extension methods!
// it would look for doSomething_X inside the returned object
// ..but that's just INTERNAL implementation. For the user
// on the outside it's chainable
}
else
// uh-oh. no what? it can be array, it can be a non-generic collection
// like System.Collections.Hashtable .. but..
// from the type-definition point of view it means it holds any
// OBJECTs inside, even mixed types, and it exposes them as IEnumerable
// which returns them as OBJECTs, so..
return items.Cast<object>()
.doSomething_X()
.doSomething_X();
}
private static IEnumerable<T> doSomething_X<T>(this IEnumerable<T> valitems)
{
// do-whatever,let's just see it being called
Console.WriteLine("I got <{1}>: {0}", valitems.Count(), typeof(T));
return valitems;
}
}
Yes, that's silly. I chained them four (2outsidex2inside) times just to show that the type information is not lost in subsequent calls. The point was to show that the 'entry point' takes a nongeneric IEnumerable and that <T> is resolved wherever it can be. You can easily adapt the code to make it a normal LINQ-to-Objects .Count() method. Similarly, one can write all other operations, too.
This example uses dynamic to let the platform resolve the most-narrow T for IEnumerable, if possible (which we need to ensure first). Without dynamic (i.e. .Net2.0) we'd need to invoke the dosomething_X through reflection, or implement it twice as dosomething_refs<T>():where T:class+dosomething_vals<T>():where T:struct and do some magic to call it properly without actually casting (probably reflection, again).
Nevertheless, it seems that you can get something-like-linq working "directly" on things hidden behind nongeneric IEnumerable. All thanks to the fact that the objects hiding behind IEnumerable still have their own full type information (yeah, that assumption may fail with COM or Remoting). However.. I think settling for IEnumerable<T> is a better option. Let's leave plain old IEnumerable to special cases where there is really no other option.
..oh.. and I actually didn't investigate if the code above is correct, fast, safe, resource-conserving, lazy-evaluating, etc.
IEnumerable<T> is a generic interface. As long as you're only dealing with generics and types known at compile-time, there's no point in using IEnumerable<object> - either use IEnumerable<int> or IEnumerable<T>, depending entirely on whether you're writing a generic method, or one where the correct type is already known. Don't try to find an IEnumerable to fit them all - use the correct one in the first place - it's very rare for that not to be possible, and most of the time, it's simply a result of bad object design.
The reason IEnumerable<int> cannot be cast to IEnumerable<object> may be somewhat surprising, but it's actually very simple - value types aren't polymorphic, so they don't support co-variance. Do not be mistaken - IEnumerable<string> doesn't implement IEnumerable<object> - the only reason you can cast IEnumerable<string> to IEnumerable<object> is that IEnumerable<T> is co-variant.
It's just a funny case of "surprising, yet obvious". It's surprising, since int derives from object, right? And yet, it's obvious, because int doesn't really derive from object, even though it can be cast to an object through a process called boxing, which creates a "real object-derived int".

IGrouping ElementAt vs. square bracket operator

IGrouping supports the ElementAt method to index into the grouping's collection. So why doesn't the square bracket operator work?
I can do something like
list.GroupBy(expr).Select(group => group.ElementAt(0)....)
but not
list.GroupBy(expr).Select(group => group[0]....)
I'm guessing this is because the IGrouping interface doesn't overload the square bracket operator. Is there a good reason why IGrouping didn't overload the square bracket operator to do the same thing as ElementAt?
That is because GroupBy returns an IEnumerable. IEnumerables don't have an indexing accessor
ElementAt<T> is a standard extension method on IEnumerable<T>, it's not a method on IGrouping, but since IGrouping derives from IEnumerable<T>, it works fine. There is no [] extension method because it's not supported by C# (it would be an indexed property, not a method)
That's a bit back to front, all enumerables are supported by (rather than supports, as it's an extension method provided from the outside) ElementAt() but only some are of a type that also support [], such as List<T> or anything that implements IList<T>.
Grouping certainly could implement [] easily enough, but then it would have to always do so, as the API would be a promise it would have to keep on keeping, or it would break code written to the old way if it did break it.
ElementAt() takes a test-and-use approach in that if something supports IList<T> it will use [] but otherwise it counts the appropriate number along. Since you can count-along with any sequence, it can therefore support any enumerable.
It so happens that Grouping does support IList<T> but as an explicit interface, so the following works:
//Bad code for demonstration purpose, do not use:
((IList<int>)Enumerable.Range(0, 50).GroupBy(i => i / 5).First())[3]
But because it's explicit it doesn't have to keep supporting it if there was ever an advantage found in another approach.
The test-and-use approach of ElementAt:
public static TSource ElementAt<TSource>(this IEnumerable<TSource> source, int index)
{
if (source == null) throw Error.ArgumentNull("source");
IList<TSource> list = source as IList<TSource>;
if (list != null) return list[index];
if (index < 0) throw Error.ArgumentOutOfRange("index");
using (IEnumerator<TSource> e = source.GetEnumerator())
{
while (true)
{
if (!e.MoveNext()) throw Error.ArgumentOutOfRange("index");
if (index == 0) return e.Current;
index--;
}
}
}
Therefore gets the optimal O(1) behaviour out of it, rather than the O(n) behaviour otherwise, but without restricting Grouping to making a promise the designers might later regret making.
ElementAt is an extension method (btw highly inefficient) defined for IEnumerable<T> to provide a pseudo-indexed access for the sequences that do not natively support it. Since IGrouping<TKey, TElement> returned from a GroupBy method inherits IEnumerable<TElement>, you can use ElementAt method. But of course IEnumerable<T> does not have an indexer defined, so thta's why you cannot use [].
Update: Just to clarify what I meant by "highly inefficient" - the implementation is the best that could be provided, but the method itself in general for the sequences that do not natively support indexer. For example
var source = Enumerable.Range(0, 1000000);
for (int i = 0, count = source.Count(); i < count; i++)
{
var element = source.ElementAt(i);
}

How does c# decide which implementation to use for IEnumerable? [duplicate]

This question already has answers here:
How does IEnumerable<T> work in background
(3 answers)
Closed 8 years ago.
I just started learning C# and I'm a bit confused with the following piece of code from MSDN:
IEnumerable<string> strings =
Enumerable.Repeat("I like programming.", 15);
Since IEnumerable is an interface and Enumerable.Repeat<>() returns IEnumerable type, how is "strings" implemented? As a List or something other container?
There is no internal collection in this case. In fact some enumerables may be infinite so you cannot always expect internal collection. In this particular method there is a field that holds the value and a loop that counts how many times move next is called. If it gets to the specified count it stops the iteration. Of course this is implemented using the iterators feature of C# which makes implementing it trivial.
That is an implementation detail. The signature of Enumerable<string>.Repeat() promises the result being of IEnumerable<string>, and that's what it returns.
Whether this value is returned as an array, List or generated in any other way (the latter being the case) that implements the required IEnumerable<T>, doesn't matter. See the Remarks section on MSDN though, saying:
This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic.
If you're interested in the actual implementation, see the proposed duplicate.
Currently anyway, the type returned is:
Enumerable/'<RepeatIterator>d__b5`1'<string>.
You can't define a variable as that tpye because it's an anonymous type. Anonymous types are implemented by compiling a type with a name that while valid .NET is not valid C# so you can't possibly accidentally create another type with the same name.
This particular anonymous type is the sort used to implement yield. Again, when you code yield in C# this is compiled by creating .NET classes that implement IEnumerable and IEnumerator.
Your code doesn't care about any of this, it just care that it gets something implementing the interface.
For that point consider the validity of:
public IEnumerable<string> SomeStrings()
{
if(new Random().Next(0, 2) == 0)
return new HashSet<string>{"a", "b", "c"};
else
return new List<string>{"a", "b", "c"};
}
Calling code won't know if it got a HashSet<string> or a List<string> and won't care. It won't care if version 2.0 actually returns string[] or uses yield.
You could create your own Repeat as follows:
public static IEnumerable<T> Repeat<T>(T element, int count)
{
while(count-- > 0)
yield return element;
}
We have a complication though in that we want to thrown an exception if count is less than zero. We can't just do:
public static IEnumerable<T> Repeat<T>(T element, int count)
{
if(count < 0)
throw new ArgumentOutOfRangeException("count");
while(count-- != 0)
yield return element;
}
This doesn't work because the throw won't happen until we actually enumerate it (if we ever do) as yield-defined enumerations don't run any code until the first enumeration. Therefore we need;
public static IEnumerable<T> Repeat<T>(T element, int count)
{
if(count < 0)
throw new ArgumentOutOfRangeException("count");
return RepeatImpl<T>(element, count);
}
private static IEnumerable<T> RepeatImpl<T>(T element, int count)
{
while(count-- != 0)
yield return element;
}

How to cast a generic type to a non-generic type

I have a method that looks like this (assume that I have the necessary method GetMySerializedDataArry() and my serializer JsonSerializer):
public static List<T> GetMyListOfData<T>()
{
var msgList = new List<T>();
foreach (string s in GetMySerializedDataArray())
{
msgList.Add(JsonSerializer.Deserialize<T>(s));
}
return msgList;
}
This works fine and as expected.
However, I want to use the same method to optionally, if and only if the generic type is specified as string, return the data unserialized like this (which does not compile and has syntax problems):
public static List<T> GetMyListOfData<T>(bool leaveSerialized)
{
if (typeof (T) != typeof(string) && leaveSerialized)
{
throw new ArgumentException("Parameter must be false when generic type is not List<string>", "leaveSerialized");
}
var msgList = new List<T>();
foreach (string s in GetMySerializedDataArray())
{
if (leaveSerialized)
{
// Casting does not work: "Cannot cast expression of type 'System.Collections.Generic.List<T>' to type 'List<string>'"
// I've tried various permutations of "is" and "as"... but they don't work with generic types
// But I know in this case that I DO have a list of strings..... just the compiler doesn't.
// How do I assure the compiler?
((List<string>)msgList).Add(s);
}
else
{
msgList.Add(JsonSerializer.Deserialize<T>(s));
}
}
return msgList;
}
My questions are in the inline comment.... basically though the compiler clearly doesn't like the cast of generic to non-generic, it won't let me use permutations of "is" and "are" operators either, I know I actually have the correct string in this case.... how to assure the compiler it is OK?
Many thanks in advance.
EDIT: SOLUTION
Thanks to Lee and Lorentz, both. I will be creating two public methods, but implementing the code in a private method with the admittedly icky decision tree about whether to leave serialization. My reason is that my real-world method is far more complex than what I posed here to SO, and I don't want to duplicate those business rules.
FINAL EDIT: CHANGED SOLUTION
Although both answers were very helpful, I have now been able to detangle business rules, and as a result the "correct" answer for me is now the first -- two different methods. Thanks again to all.
You should not return a list of strings as a list of T. I would suggest that you use two separate methods and skip the parameter:
public static List<T> GetMyListOfData<T>()
public static List<string> GetSerializedMyListOfData()
The advantages of this approach is
It's more readable (imo) GetSerializedMyListOfData() vs GetMyListOfData<string>(true)
You also know the intent of the caller at compile time and don't have to throw an exception when the type argument don't match the intent to leave the data serialized
You can cast to object first:
((List<string>)(object)msgList).Add(s);
however a cleaner solution could be to create another method for dealing with strings, this would also allow you to remove the leaveSerialized parameter.

Categories

Resources