Implementation of WhereListIterator.ToList() - c#

In a piece of code like
List<int> foo = new List<int>() { 1, 2, 3, 4, 5, 6 };
IEnumerable<int> bar = foo.Where(x => x % 2 == 1);
bar is of type System.Linq.Enumerable.WhereListIterator<int> due to deferred execution. Since it implements IEnumerable<int> it is possible to convert it to a List<int>using ToList(). However, I have been unable to identify some parts of the code that is run when ToList() is called. I am using dotPeek as a decompiler and this is my first time attempting such a thing, so correct me if i made any mistakes on the way.
I will describe what I found so far below (All assemblies are Version 4.0.0.0):
Enumerable.WhereArrayIterator<TSource> is implemented in the file Enumerable.cs of the namespace System.Linq in the assembly System.Core. The class neither defines ToList() itself nor does it implement IEnumerable<TSource>. It implements Enumerable.Iterator<TSource> which is located in the same file. Enumerable.Iterator<TSource> does implement IEnumerable<TSource>.
ToList() is an extension mewthod that is also located in Enumerable.cs. All it does is null checking and then calling the constructor of List<TSource> with its argument.
List<T> is defined in the file List.cs of the namespace System.Collections.Generic in the assembly mscorlib. The constructor that is called by ToList() has the signature public List(IEnumerable<T> collection). It once again null checks and then casts the argument to ICollection<T>. If the collection has no elements, its creates a new list of an empty array, otherwise it uses the ICollection.CopyTo() method to create the new list.
ICollection<T> is defined in mscorlib \ System.Collections.Generic \ ICollection.cs. It implements IEnumerable in its generic and non-generic form.
This is where I am stuck. Neither Enumerable.WhereArrayIterator<TSource> nor Enumerable.Iterator<TSource> implement ICollection, so somewhere, a cast has to happen and I am unable to locate the code that is run when CopyTo() is called.

This is the relevant part in the List<T> constructor (ILSpy):
ICollection<T> collection2 = collection as ICollection<T>; // this won't succeed
if (collection2 != null)
{
int count = collection2.Count;
this._items = new T[count];
collection2.CopyTo(this._items, 0);
this._size = count;
return;
}
// this will be used instead
this._size = 0;
this._items = new T[4];
using (IEnumerator<T> enumerator = collection.GetEnumerator())
{
while (enumerator.MoveNext())
{
this.Add(enumerator.Current);
}
}
So you see that collection as ICollection<T>; tries to cast to ICollection<T>, if that works the efficient CopyTo will be used, otherwise the sequence will be enumerated entirely.
Your WhereListIterator<int> is a query and not a collection, so it cannot be casted to ICollection<T>, hence it will be enumerated.

I think you're getting confused by the as operator. It's basically a safe cast. It's equivalent to this, but a bit faster:
MyEndType x = null;
if (MyVarWithAs is MyEndType) x = (MyEndType)MyVarWithAs;
Now, let's look at the code again now.
public List(IEnumerable<T> collection)
{
if (collection == null)
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection);
ICollection<T> collection1 = collection as ICollection<T>;
if (collection1 != null)
{
int count = collection1.Count;
if (count == 0)
{
this._items = List<T>._emptyArray;
}
else
{
this._items = new T[count];
collection1.CopyTo(this._items, 0);
this._size = count;
}
}
else
{
this._size = 0;
this._items = List<T>._emptyArray;
foreach (T obj in collection)
this.Add(obj);
}
}
As you can see, in the if it checks if it's null. If it's null, it means that it is not an ICollection<T>, so then it goes to the else. All the else does is set everything to the default, and then adds everything in manually. When you pass in an IEnumerable<T> that is not an ICollection<T> (like in your example) it will go through the else path.

Related

Foreach and enumerable

Why we can iterate item ex
mList.ForEach((item)
{
item.xyz ....
}
and for a simple array we need to force foreach loop?
foreach(int i in arr)
i.xyz
or use delegate type ?
Action<int> action = new Action<int>(myfunc);
Array.ForEach(intArray, action);
What is the differemce?
The first syntax is not correct. It should be like this:
mList.ForEach(item =>
{
// item.xyz
});
The ForEach is a method of List<T> that enables you for each item in a list to call an Action<T>.
On the other hand the foreach
statement repeats a group of embedded statements for each element in
an array or an object collection that implements the
System.Collections.IEnumerable or
System.Collections.Generic.IEnumerable interface.
That being said, ForEach can be called only on lists and foreach can be called on any object that implements either IEnumerable or IEnumerable. That's the big difference here.
Regarding the delegate type, there isn't any difference. Actually, lambda expressions item=>{ item.xyz = ...} are a shorthand for delegates.
The language defines foreach as an operation of IEnumerable. Therefore, everything which implements IEnumerable is iteratable. However, not all IEnumerables 'make sense' when using a ForEach block.
Take this for example:
public static IEnumerable<MyObject> GetObjects()
{
var i = 0;
while(i < 30)
yield return new MyObject { Name = "Object " + i++ };
}
And then you do something like this:
var objects = GetObjects();
objects.ForEach(o => o.Name = "Rob");
foreach (var obj in objects)
Console.WriteLine(obj.Name);
IF that compiled, it would print out Object 0 to Object 29 - NOT Rob 30 times.
The reason for this is that the iterator is reset each time you iterate the enumerable. It makes sense for ForEach on a list, as the enumerable has been materialized, and objects are not re-created every time you iterate it.
In order to make ForEach work on an enumerable, you'd need to materialize the collection as well (such as putting it into a list), but even that is not always possible, as you can have an enumerable with no defined end:
public static IEnumerable<MyObject> GetObjects()
{
while(true)
yield return new MyObject { Name = "Object " };
}
It also makes sense to have ForEach on Array - but for reasons I'm unaware of, it was defined as Array.ForEach(arr) rather than arr.ForEach()
Moral of the story is, if you think you need a ForEach block, you probably want to materialize the enumerable first, usually to a List<T> or an array (T[]).

List<String> ByRef

I'm wondering how one can prove what the .Net framework is doing behind the scenes.
I have a method that accepts a parameter of a List<String> originalParameterList.
In my method I have another List<String> newListObj if I do the following:
List<String> newListObj = originalParameterList
newListObj.Add(value);
newListObj.Add(value1);
newListObj.Add(value2);
The count of the originalParameterList grows (+3).
If I do this:
List<String> newListObj = new List<String>(originalParamterList);
newListObj.Add(value);
newListObj.Add(value1);
newListObj.Add(value2);
The count of the originalParameterList stays the sames (+0).
I also found that this code behaves the same:
List<String> newListObj = new List<String>(originalParamterList.ToArray());
newListObj.Add(value);
newListObj.Add(value1);
newListObj.Add(value2);
The count of the originalParameterList stays the sames (+0).
My question is, is there a way to see what the .Net Framework is doing behind the scenes in a definitive way?
You can load your assembly into ILDASM and(when loaded),find your method and double-click it,
it will show the cil code of that method.Just type "IL" in windows start menu in the search.
Alternatively you can you can use these following ways to also create a new independent list
private void GetList(List<string> lst)
{
List<string> NewList = lst.Cast<string>().ToList();
NewList.Add("6");
//not same values.
//or....
List<string> NewList = lst.ConvertAll(s => s);
NewList.Add("6");
//again different values
}
Normally, the documentation should give enough information to use the API.
In your specific example, the documentation for public List(IEnumerable<T> collection) says (emphasis mine):
Initializes a new instance of the List class that contains elements
copied from the specified collection and has sufficient capacity to
accommodate the number of elements copied.
For the reference here is the source code for the constructor:
public List (IEnumerable <T> collection)
{
if (collection == null)
throw new ArgumentNullException ("collection");
// initialize to needed size (if determinable)
ICollection <T> c = collection as ICollection <T>;
if (c == null) {
_items = EmptyArray<T>.Value;;
AddEnumerable (collection);
} else {
_size = c.Count;
_items = new T [Math.Max (_size, DefaultCapacity)];
c.CopyTo (_items, 0);
}
}
void AddEnumerable (IEnumerable <T> enumerable)
{
foreach (T t in enumerable)
{
Add (t);
}
}
The simplest way to do it is simply go to MSDN
http://msdn.microsoft.com/en-us/library/fkbw11z0.aspx
It says that
Initializes a new instance of the List class that contains elements copied from the specified collection and has sufficient capacity to accommodate the number of elements copied.
so internally it`s simply add all elements of passed IEnumerable into new list. It also says that
this is a O(n) operation
which means that no optimizations assumed.
That's because the frist case you referenced the original list (since it is a reference type), and you modified it's collection via newListObj. The second and third case you copied the original objects' collection via List constructor List Class, and you modified the new collection, which is not take any effect to the original.
As others already said, there are various tools that let you examine the source code of the .NET framework. I personally prefer dotPeek from JetBrains, which is free.
In the specific case that you have mentioned, I think when you pass a list into the constructor of another list, that list is copied. If you just assign one variable to another, those variables are then simply referring to the same list.
You can either
read the documentation over at MSDN
decompile the resulting MSIL-code, for instance using Telerik's free JustDecompile
or step through the .NET Framework code using the debugger.
This is the code from List constrcutor:
public List(IEnumerable<T> collection)
{
if (collection == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection);
}
ICollection<T> collection2 = collection as ICollection<T>;
if (collection2 != null)
{
int count = collection2.Count;
this._items = new T[count];
collection2.CopyTo(this._items, 0);
this._size = count;
return;
}
this._size = 0;
this._items = new T[4];
using (IEnumerator<T> enumerator = collection.GetEnumerator())
{
while (enumerator.MoveNext())
{
this.Add(enumerator.Current);
}
}
}
As you can see when you calls costructor which takes IEnumerable it copies all data to itself.

Is it possible to count something from general container in c#?

I have an object.
This object is casting an Items Container (I don't know what items, but I can check).
But is there any code which can help me find how many items it contains?
I mean
object[] arrObj = new object[2] {1, 2};
object o = (object)arrObj;
In this case arrObj is an array so I can check:
((Array)o).Length //2
But what if I have those 2 others ?
ArrayList al = new ArrayList(2);
al.Add(1);
al.Add(2);
object o = (object)al ;
and
List<object> lst= new List<object>(2);
object o = (object)lst;
Is there any general code which can help me find how many items are in this casted object (o in this samples) ?
Of course I can check if (o is ...) { } but Im looking for more general code.
You can cast to the interface every container implements: IEnumerable. However, to be more performant, it is a good idea to first try IEnumerable<T>:
var count = -1;
var enumerable = lst as IEnumerable<object>;
if(enumerable != null)
count = enumerable.Count();
else
{
var nonGenericEnumerable = lst as IEnumerable;
count = nonGenericEnumerable.Cast<object>().Count();
}
For Count() to be available, you need to add using System.Linq; to your .cs file.
Please note that this code has one big advantage: If the collection implements ICollection<T> - like List<T> or strong typed arrays of reference types - this code executes in O(1) [Assuming the concrete implementation of ICollection<T>.Count executes in O(1)]. Only if it doesn't - like ArrayList or strong typed arrays of value types - does this code execute in O(n) and additionally, it will box the items in the case of an array of value types.
You could use linq.
var count = ((IEnumerable)o).Cast<object>().Count();
Ensure that the type o has implements IEnumerable and that you have using System.Linq at the top of your file.
Well the most basic interface it could implement would be IEnumerable. Unfortunately even Enumerable.Count from LINQ is implemented for IEnumerable<T>, but you could easily write your own:
public static int Count(IEnumerable sequence)
{
// Shortcut for any ICollection implementation
var collection = sequence as ICollection;
if (collection != null)
{
return collection.Count;
}
var iterator = sequence.GetEnumerator();
try
{
int count = 0;
while (iterator.MoveNext())
{
count++;
}
return count;
}
finally
{
IDisposable disposable = iterator as IDisposable;
if (disposable != null)
{
disposable.Dispose();
}
}
}
Note that this is basically equivalent to:
int count = 0;
foreach (object item in sequence)
{
count++;
}
... except that because it never uses Current, it wouldn't need to do any boxing if your container was actually an int[] for example.
Call it with:
var sequence = container as IEnumerable;
if (sequence != null)
{
int count = Count(sequence);
// Use the count
}
It's worth noting that avoiding boxing really is a bit of a micro-optimization: it's unlikely to really be significant. But you can do it once, just in this method, and then take advantage of it everywhere.

IEnumerable to something that I can get a Count from?

I've got this:
private IEnumerable _myList;
I need to get a count off of that object. I was previously typing _myList to an array and getting the length, but now we are using this same bit of code with a different kind of object. It's still a Collection type (it's a strongly typed Subsonic Collection object), and everything works great, except for the bit that we need to get the total number of items in the object.
I've tried typing it to CollectionBase, and many many other types, but nothing works that will let me get a .Count or .Length or anything like that.
Can anyone point me in the right direction?
EDIT: I'm not using 3.5, I'm using 2. So, anything dealing with Linq won't work. Sorry for not posting this earlier.
Is this actually IEnumerable instead of IEnumerable<T>? If so, LINQ won't help you directly. (You can use Cast<T>() as suggested elsewhere, but that will be relatively slow - in particular, it won't be optimised for IList/IList<T> implementations.)
I suggest you write:
public static int Count(this IEnumerable sequence)
{
if (sequence == null)
{
throw new ArgumentNullException("sequence");
}
// Optimisation: won't optimise for collections which
// implement ICollection<T> but not ICollection, admittedly.
ICollection collection = sequence as ICollection;
if (collection != null)
{
return collection.Count;
}
IEnumerator iterator = sequence.GetEnumerator();
try
{
int count = 0;
while (iterator.MoveNext())
{
// Don't bother accessing Current - that might box
// a value, and we don't need it anyway
count++;
}
return count;
}
finally
{
IDisposable disposable = iterator as IDisposable;
if (disposable != null)
{
disposable.Dispose();
}
}
}
The System.Linq.Enumerable.Count extension method does this for a typed IEnumerable<T>.
For an untyped IEnumerable try making your own extension:
public static int Count(this IEnumerable source)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
ICollection collectionSource = source as ICollection;
if (collectionSource != null)
{
return collectionSource.Count;
}
int num = 0;
IEnumerator enumerator = source.GetEnumerator();
//try-finally block to ensure Enumerator gets disposed if disposable
try
{
while (enumerator.MoveNext())
{
num++;
}
}
finally
{
// check for disposal
IDisposable disposableEnumerator = enumerator as IDisposable;
if(disposableEnumerator != null)
{
disposableEnumerator.Dispose();
}
}
return num;
}
If you're using .NET 3.5, you can use Enumerable.Count() to get the count from any IEnumerable<T>.
This will not work off a non-generic IEnumerable, though - it requires IEnumerable<T>.
This should work, though, since Subsonic's collection classes implement the appropriate interfaces for you. You'll need to change your definition from IEnumerable to IEnumerable<MyClass>.
LINQ provides a Count() extension method.
using System.Linq;
...
var count = _myList.Count();
The type you use is IEnumerable, which doesn't have a Count property. But the generic equivalent, IEnumerable(T), has a Count property.
The obvious solution is to use IEnumerable(T), but if you can't, you could do something like this:
_myList.Cast<MyListItemType>().Count()
The cast is an easy way to convert a IEnumerable to an IEnumerable(SomeType) but obviously is not the best way to get the count performance-wise.
If performance is a factor, I'd just loop through the values to get the count, unless you know the underlying collection has a Count property (see Jon Skeet's answer...).
If you include the System.Linq namespace, IEnumerable<T> has a Count() extension method available. You can write your own extension method to get it on the non-generic version. Note this method will box value types, so if that might end up being a performance concern for you, go with Jon Skeet's solution. This is just simpler.
public static int Count(this IEnumerable enumerable)
{
int count = 0;
foreach(object item in enumerable)
{
count++;
}
return count;
}
What about calling .ToList()?
If the underlying object implements ICollection, then you can use the .Count() property.
I needed the same and I created IEnumerableList. The reason was that I didnt like to evaluate every time I need the count through whole enumerable object as it's done with the extension method Count().
More about it here: http://fknet.wordpress.com/2010/08/11/string-formatwith-extension/

Checking if a list is empty with LINQ

What's the "best" (taking both speed and readability into account) way to determine if a list is empty? Even if the list is of type IEnumerable<T> and doesn't have a Count property.
Right now I'm tossing up between this:
if (myList.Count() == 0) { ... }
and this:
if (!myList.Any()) { ... }
My guess is that the second option is faster, since it'll come back with a result as soon as it sees the first item, whereas the second option (for an IEnumerable) will need to visit every item to return the count.
That being said, does the second option look as readable to you? Which would you prefer? Or can you think of a better way to test for an empty list?
Edit #lassevk's response seems to be the most logical, coupled with a bit of runtime checking to use a cached count if possible, like this:
public static bool IsEmpty<T>(this IEnumerable<T> list)
{
if (list is ICollection<T>) return ((ICollection<T>)list).Count == 0;
return !list.Any();
}
You could do this:
public static Boolean IsEmpty<T>(this IEnumerable<T> source)
{
if (source == null)
return true; // or throw an exception
return !source.Any();
}
Edit: Note that simply using the .Count method will be fast if the underlying source actually has a fast Count property. A valid optimization above would be to detect a few base types and simply use the .Count property of those, instead of the .Any() approach, but then fall back to .Any() if no guarantee can be made.
I would make one small addition to the code you seem to have settled on: check also for ICollection, as this is implemented even by some non-obsolete generic classes as well (i.e., Queue<T> and Stack<T>). I would also use as instead of is as it's more idiomatic and has been shown to be faster.
public static bool IsEmpty<T>(this IEnumerable<T> list)
{
if (list == null)
{
throw new ArgumentNullException("list");
}
var genericCollection = list as ICollection<T>;
if (genericCollection != null)
{
return genericCollection.Count == 0;
}
var nonGenericCollection = list as ICollection;
if (nonGenericCollection != null)
{
return nonGenericCollection.Count == 0;
}
return !list.Any();
}
LINQ itself must be doing some serious optimization around the Count() method somehow.
Does this surprise you? I imagine that for IList implementations, Count simply reads the number of elements directly while Any has to query the IEnumerable.GetEnumerator method, create an instance and call MoveNext at least once.
/EDIT #Matt:
I can only assume that the Count() extension method for IEnumerable is doing something like this:
Yes, of course it does. This is what I meant. Actually, it uses ICollection instead of IList but the result is the same.
I just wrote up a quick test, try this:
IEnumerable<Object> myList = new List<Object>();
Stopwatch watch = new Stopwatch();
int x;
watch.Start();
for (var i = 0; i <= 1000000; i++)
{
if (myList.Count() == 0) x = i;
}
watch.Stop();
Stopwatch watch2 = new Stopwatch();
watch2.Start();
for (var i = 0; i <= 1000000; i++)
{
if (!myList.Any()) x = i;
}
watch2.Stop();
Console.WriteLine("myList.Count() = " + watch.ElapsedMilliseconds.ToString());
Console.WriteLine("myList.Any() = " + watch2.ElapsedMilliseconds.ToString());
Console.ReadLine();
The second is almost three times slower :)
Trying the stopwatch test again with a Stack or array or other scenarios it really depends on the type of list it seems - because they prove Count to be slower.
So I guess it depends on the type of list you're using!
(Just to point out, I put 2000+ objects in the List and count was still faster, opposite with other types)
List.Count is O(1) according to Microsoft's documentation:
http://msdn.microsoft.com/en-us/library/27b47ht3.aspx
so just use List.Count == 0 it's much faster than a query
This is because it has a data member called Count which is updated any time something is added or removed from the list, so when you call List.Count it doesn't have to iterate through every element to get it, it just returns the data member.
The second option is much quicker if you have multiple items.
Any() returns as soon as 1 item is found.
Count() has to keep going through the entire list.
For instance suppose the enumeration had 1000 items.
Any() would check the first one, then return true.
Count() would return 1000 after traversing the entire enumeration.
This is potentially worse if you use one of the predicate overrides - Count() still has to check every single item, even it there is only one match.
You get used to using the Any one - it does make sense and is readable.
One caveat - if you have a List, rather than just an IEnumerable then use that list's Count property.
#Konrad what surprises me is that in my tests, I'm passing the list into a method that accepts IEnumerable<T>, so the runtime can't optimize it by calling the Count() extension method for IList<T>.
I can only assume that the Count() extension method for IEnumerable is doing something like this:
public static int Count<T>(this IEnumerable<T> list)
{
if (list is IList<T>) return ((IList<T>)list).Count;
int i = 0;
foreach (var t in list) i++;
return i;
}
... in other words, a bit of runtime optimization for the special case of IList<T>.
/EDIT #Konrad +1 mate - you're right about it more likely being on ICollection<T>.
Ok, so what about this one?
public static bool IsEmpty<T>(this IEnumerable<T> enumerable)
{
return !enumerable.GetEnumerator().MoveNext();
}
EDIT: I've just realized that someone has sketched this solution already. It was mentioned that the Any() method will do this, but why not do it yourself? Regards
Another idea:
if(enumerable.FirstOrDefault() != null)
However I like the Any() approach more.
This was critical to get this to work with Entity Framework:
var genericCollection = list as ICollection<T>;
if (genericCollection != null)
{
//your code
}
If I check with Count() Linq executes a "SELECT COUNT(*).." in the database, but I need to check if the results contains data, I resolved to introducing FirstOrDefault() instead of Count();
Before
var cfop = from tabelaCFOPs in ERPDAOManager.GetTable<TabelaCFOPs>()
if (cfop.Count() > 0)
{
var itemCfop = cfop.First();
//....
}
After
var cfop = from tabelaCFOPs in ERPDAOManager.GetTable<TabelaCFOPs>()
var itemCfop = cfop.FirstOrDefault();
if (itemCfop != null)
{
//....
}
private bool NullTest<T>(T[] list, string attribute)
{
bool status = false;
if (list != null)
{
int flag = 0;
var property = GetProperty(list.FirstOrDefault(), attribute);
foreach (T obj in list)
{
if (property.GetValue(obj, null) == null)
flag++;
}
status = flag == 0 ? true : false;
}
return status;
}
public PropertyInfo GetProperty<T>(T obj, string str)
{
Expression<Func<T, string, PropertyInfo>> GetProperty = (TypeObj, Column) => TypeObj.GetType().GetProperty(TypeObj
.GetType().GetProperties().ToList()
.Find(property => property.Name
.ToLower() == Column
.ToLower()).Name.ToString());
return GetProperty.Compile()(obj, str);
}
Here's my implementation of Dan Tao's answer, allowing for a predicate:
public static bool IsEmpty<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
if (source == null) throw new ArgumentNullException();
if (IsCollectionAndEmpty(source)) return true;
return !source.Any(predicate);
}
public static bool IsEmpty<TSource>(this IEnumerable<TSource> source)
{
if (source == null) throw new ArgumentNullException();
if (IsCollectionAndEmpty(source)) return true;
return !source.Any();
}
private static bool IsCollectionAndEmpty<TSource>(IEnumerable<TSource> source)
{
var genericCollection = source as ICollection<TSource>;
if (genericCollection != null) return genericCollection.Count == 0;
var nonGenericCollection = source as ICollection;
if (nonGenericCollection != null) return nonGenericCollection.Count == 0;
return false;
}
List<T> li = new List<T>();
(li.First().DefaultValue.HasValue) ? string.Format("{0:yyyy/MM/dd}", sender.First().DefaultValue.Value) : string.Empty;
myList.ToList().Count == 0. That's all
This extension method works for me:
public static bool IsEmpty<T>(this IEnumerable<T> enumerable)
{
try
{
enumerable.First();
return false;
}
catch (InvalidOperationException)
{
return true;
}
}

Categories

Resources