I understand the hierarchy between the two and know how to convert between them, but is there anything fundamentally wrong with declaring a list like this?
IEnumerable<int> _bookmarkedDeals = new List<int>();
I don't need to change the list at any point, its either created or recreated as a whole.
As an alternative, possibly this
IEnumerable<int> _bookmarkedDeals = Enumerable.Empty<int>();
Well, in this case, _bookmarkedDeals will be empty so the declaration is somewhat useless. That being said, nothing wrong with treating a List as an IEnumerable.
(Unlike all answers here..) Yes. There is something very wrong with this declaration. You won't be able to use any of List<T> features without casting all the time back to list.
It's a good practice to return this type from your method if no list operations will be performed later on. But while you're in the same method - you're just making your life harder and less performant (because of the casts).
There is absolutely no problem with your declaration, although some people do like to declare the variable type precisely without any abstractization.
There is nothing wrong with that declaration but unless you have any specific restriction, IList<int> is a more appropiated interface for a list.
Nothing wrong.
If you never change the list in any way you could possibly use an array - it is slightly lighter than a list.
There's nothing fundamentally wrong with it, but why would you want to slice off some of the functionality that List provides (that IEnumerable doesn't)?
Nothing wrong with that, that's good design.
Have a read of Arrays Considered Somewhat Harmful.
Fabulous read about selecting the right interface.
No, there is nothing wrong with it.
If this is a local variable, I see no benefit over just using var to create a variable of type List<int>. If you are just enumerating, it does no harm at all though. If it is a field, perhaps event publicly exposed, it can be very sensible not to expose it as a List<int> or IList<int> to keep the free choice of concrete type.
There is no syntax problem but I don't see how it could be useful. If you don't insert any item in this row, this list is going to stay always empty unless you cast it back to List<int>, so why you want to declare it as IEnumerable<int>?
Related
In C#, I was told that if a container class such as List(T) is first upcast to a container interface such as IEnumerable, and then subsequently iterated over using foreach then runtime garbage will be created. Also, even when fully downcast, I was told that iterating over Collection(T) also creates references on the heap. I understand that this is a result of a virtual call to GetEnumerator() which may return a reference or value type result.
Inspection of the MSDN docs for values types clearly lists all enumerations as value type. If an enumeration consists of an enumerator list then aren't these enumerators of value type as per the docs? are they boxed? or completely unrelated to each other but with similar names? or something else entirely?
I'm not sure how to unify these two statements and I was hoping someone could explain it to me more plainly.
Thanks.
EDIT: Question rephrased taking into account commentor suggestions on use of words such as 'never' and 'un-necessary'
Enumarations (enums) are unrelated to Enumerators.
And
I was told that to avoid...
Seems like a very premature and unnecessary optimization.
Previously I used IEnumerable<T> type, if I passed collection as parameter of method.
But recently, I had a problem with the collection of type IEnumerable<T> that was created in similar way:
var peoples = names.Select(name => new People(name));
In this case, always, if I use a collection peoples (for example, foreach), it creates new instance of class People, and it can easily cause an error.
So I want to ask whether it is right to use the IEnumerable <T> type parameter. I think it may cause problems (see example above) and this type should not be used. What alternatives do you recommend (ICollection<T>, IList<T> etc.) and when to use which alternative?
Or do you think that this is a silly question, because the creation of objects in the Select method uses only a fool?
Of course, I know that I can use ToArray() or ToList() and thus solve the problem. But someone else who uses this method, it can not know. I would like to know how to prevent this by selecting the correct type parameter. List or array is too specific for me when I want to just "enumerate" objects.
IEnumerable is not a collection. It is just something that you can "enumerate". The problem is not passing IEnumerable to your method, the problem is that if you are using LINQ (Select method), every time you read the enumerable it will execute the code again. If you only want to have it executed once, you can use the ToArray() or ToList() methods:
var peoples = names.Select(name => new People(name)).ToList();
Like this you can still pass it to any method accepting an IEnumerable (or List) and it will only create one instance for each person.
Edit:
Your method shouldn't worry about these kind of problems. It's the callers problem. There might be perfectly good reasons to call your method with an enumerable instead of a list. The caller should know that the enumerable gives different results if he passes it to different methods, so you shouldn't worry about that.
The only exception is if you enumerate the parameter more than once in the method itself. In this case you should cache the parameter in a list inside the method and then enumerate the list instead as many times as you need.
The ToArray and ToList suggestions expose more than what one might initially think. We're tempted to think of this advice as simply saying that calling ToList/ToArray at either the call-site or the first thing in your method corrects the issue, but your question is whether IEnumerable<T> is appropriate - you could change from a parameter type from IEnumerable<T> to something else (like ICollection<T>) which puts the onus on the caller to convert to something that implements this interface (note that T[], List<T>, IList<T> and Collection<T> all do). Part of the problem with this approach is that these interfaces represent mutable collections, whereas IEnumerable<T> advertises that the method enumerates items - just one of the reasons I don't like this approach.
What if the potential bug was not really a bug at all? Perhaps the caller intends these to be defensive copies or dumb-data objects - in these latter cases, it may be inefficient by some measure but so is requiring them to make a copy - but in this proposed use it definitely is not a bug. Likewise, a one-size fits all recommendation doesn't fit because IEnumerable<T> objects don't have to ever terminate - but requiring an array type to be passed-in would mean that infinite (i.e. computed) or merely large IEnumerable<T> objects would be out of the question.
Regardless, I think you're right to pose questions regarding defensive programming. However, in this case I think the best solution is to stick with IEnumerable<T> and educate rather than limit your callers based on the speculation that they might, in some limited circumstances, introduce a bug.
Paraphrasing a quote:
The problem with designing to prevent issues that idiots will make is that the idiots are so damned ingenious.
Hope this helps. Cheers!
I usually find myself doing something like:
string[] things = arrayReturningMethod();
int index = things.ToList<string>.FindIndex((s) => s.Equals("FOO"));
//do something with index
return things.Distinct(); //which returns an IEnumerable<string>
and I find all this mixup of types/interface a bit confusing and it tickles my potential performance problem antennae (which I ignore until proven right, of course).
Is this idiomatic and proper C# or is there a better alternative to avoid casting back and forth to access the proper methods to work with the data?
EDIT:
The question is actually twofold:
When is it proper to use either the IEnumerable interface or an array or a list (or any other IEnumerable implementing type) directly (when accepting parameters)?
Should you freely move between IEnumerables (implementation unknown) and lists and IEnumerables and arrays and arrays and Lists or is that non idiomatic (there are better ways to do it)/ non performant (not typically relevant, but might be in some cases) / just plain ugly (unmaintable, unreadable)?
In regards to performance...
Converting from List to T[] involves copying all the data from the original list to a newly allocated array.
Converting from T[] to List also involves copying all the data from the original list to a newly allocated List.
Converting from either List or T[] to IEnumerable involves casting, which is a few CPU cycles.
Converting from IEnumerable to List involves upcasting, which is also a few CPU cycles.
Converting from IEnumerable to T[] also involves upcasting.
You can't cast an IEnumerable to T[] or List unless it was a T[] or List respectively to begin with. You can use the ToArray or ToList functions, but those will also result in a copy being made.
Accessing all the values in order from start to end in a T[] will, in a straightforward loop, be optimized to use straightforward pointer arithmetic -- which makes it the fastest of them all.
Accessing all the values in order from start to end in a List involves a check on each iteration to make sure that you aren't accessing a value outside the array's bounds, and then the actual accessing of the array value.
Accessing all the values in an IEnumerable involves creating an enumerator object, calling the Next() function which increases the index pointer, and then calling the Current property which gives you the actual value and sticks it in the variable that you specified in your foreach statement. Generally, this isn't as bad as it sounds.
Accessing an arbitrary value in an IEnumerable involves starting at the beginning and calling Next() as many times as you need to get to that value. Generally, this is as bad as it sounds.
In regards to idioms...
In general, IEnumerable is useful for public properties, function parameters, and often for return values -- and only if you know that you're going to be using the values sequentially.
For instance, if you had a function PrintValues, if it was written as PrintValues(List<T> values), it would only be able to deal with List values, so the user would first have to convert, if for instance they were using a T[]. Likewise with if the function was PrintValues(T[] values). But if it was PrintValues(IEnumerable<T> values), it would be able to deal with Lists, T[]s, stacks, hashtables, dictionaries, strings, sets, etc -- any collection that implements IEnumerable, which is practically every collection.
In regards to internal use...
Use a List only if you're not sure how many items will need to be in it.
Use a T[] if you know how many items will need to be in it, but need to access the values in an arbitrary order.
Stick with the IEnumerable if that's what you've been given and you just need to use it sequentially. Many functions will return IEnumerables. If you do need to access values from an IEnumerable in an arbitrary order, use ToArray().
Also, note that casting is different from using ToArray() or ToList() -- the latter involves copying the values, which is indeed a performance and memory hit if you have a lot of elements. The former simply is to say that "A dog is an animal, so like any animal, it can eat" (downcast) or "This animal happens to be a dog, so it can bark" (upcast). Likewise, All Lists and T[]s are IEnumerables, but only some IEnumerables are Lists or T[]s.
A good rule of thumb is to always use IEnumerable (when declaring your variables/method parameters/method return types/properties/etc.) unless you have a good reason not to. By far the most type-compatible with other (especially extension) methods.
Well, you've got two apples and an orange that you are comparing.
The two apples are the array and the List.
An array in C# is a C-style array that has garbage collection built in. The upside of using them it that they have very little overhead, assuming you don't need to move things around. The bad thing is that they are not as efficient when you are adding things, removing things, and otherwise changing the array around, as memory gets shuffled around.
A List is a C# style dynamic array (similar to the vector<> class in C++). There is more overhead, but they are more efficient when you need to be moving things around a lot, as they will not try to keep the memory usage contiguous.
The best comparison I could give is saying that arrays are to Lists as strings are to StringBuilders.
The orange is 'IEnumerable'. This is not a datatype, but rather it is an interface. When a class implements the IEnumerable interface, it allows that object to be used in a foreach() loop.
When you return the list (as you did in your example), you were not converting the list to an IEnumerable. A list already is an IEnumerable object.
EDIT: When to convert between the two:
It depends on the application. There is very little that can be done with an array that cannot be done with a List, so I would generally recommend the List. Probably the best thing to do is to make a design decision that you are going to use one or the other, that way you don't have to switch between the two. If you rely on an external library, abstract it away to maintain consistent usage.
Hope this clears a little bit of the fog.
Looks to me like the problem is that you haven't bothered learning how to search an array. Hint: Array.IndexOf or Array.BinarySearch depending on whether the array is sorted.
You're right that converting to a list is a bad idea: it wastes space and time and makes the code less readable. Also, blindly upcasting to IEnumerable slows matters down and also completely prevents use of certain algorithms (such as binary search).
I try to avoid rapidly jumping between data types if it can be avoided.
It must be the case that each situation similar to that you described is sufficiently different so as to prevent a dogmatic rule about transforming your types; however, it is generally good practice to select a data structure that provides as best as possible the interface you need without having to copying elements needlessly to new data structures.
When to use what?
I would suggest returning the most specific type, and taking in the most flexible type.
Like this:
public int[] DoSomething(IEnumerable<int> inputs)
{
//...
}
public List<int> DoSomethingElse(IList<int> inputs)
{
//...
}
That way you can call methods on List< T > for whatever you get back from the method in addition to treating it as an IEnumerable. On the inputs, use as flexible as possible, so you don't dictate the users of your method what kind of collection to create.
You're right to ignore the 'performance problem' antennae until you actually have a performance problem. Most performance problems come from doing too much I/O or too much locking or doing one of them wrong, and none of these apply to this question.
My general approach is:
Use T[] for 'static' or 'snapshot'-style information. Use for things where calling .Add() wouldn't make sense anyway, and you don't need the extra methods List<T> gives you.
Accept IEnumerable<T> if you don't really care what you're given and don't need a constant-time .Length/.Count.
Only return IEnumerable<T> when you're doing simple manipulations of an input IEnumerable<T> or when you specifically want to make use of the yield syntax to do your work lazily.
In all other cases, use List<T>. It's just too flexible.
Corollary to #4: don't be afraid of ToList(). ToList() is your friend. It forces the IEnumerable<T> to evaluate right then (useful for when you're stacking several where clauses). Don't go nuts with it, but feel free to call it once you've built up your full where clause before you do the foreach over it (or the like).
Of course, this is just a rough guideline. Just please try to follow the same pattern in the same codebase -- code styles that jump around make it harder for maintenance coders to get into your frame of mind.
What's the preferred container type when returning multiple objects of the same type from a function?
Is it against good practice to return a simple array (like MyType[]), or should you wrap it in some generic container (like ICollection<MyType>)?
Thanks!
Eric Lippert has a good article on this. In case you can't be bothered to read the entire article, the answer is: return the interface.
Return an IEnumerable<T> using a yield return.
I would return an IList<T> as that gives the consumer of your function the greatest flexibility. That way if the consumer of your function only needed to enumerate the sequence they can do so, but if they want to use the sequence as a list they can do that as well.
My general rule of thumb is to accept the least restrictive type as a parameter and return the richest type I can. This is, of course, a balancing act as you don't want to lock yourself into any particular interface or implementation (but always, always try to use an interface).
This is the least presumptuous approach that you, the API developer, can take. It is not up to you to decide how a consumer of your function will use what they send you - that is why you would return an IList<T> in this case as to give them the greatest flexibility. Also for this same reason you would never presume to know what type of parameter a consumer will send you. If you only need to iterate a sequence sent to you as a parameter then make the parameter an IEnumerable<T> rather than a List<T>.
EDIT (monoxide): Since it doesn't look like the question is going to be closed, I just want to add a link from the other question about this: Why arrays are harmful
Why not List<T>?
From the Eric Lippert post mentioned by others, I thought I will highlight this:
If I need a sequence I’ll use
IEnumerable<T>, if I need a mapping
from contiguous numbers to data I’ll
use a List<T>, if I need a mapping
across arbitrary data I’ll use a
Dictionary<K,V>, if I need a set I’ll
use a HashSet<T>. I simply don’t need
arrays for anything, so I almost never
use them. They don’t solve a problem I
have better than the other tools at my
disposal.
A good piece of advice that I've oft heard quoted is this:
Be liberal in what you accept, precise in what you provide.
In terms of designing your API, I'd suggest you should be returning an Interface, not a concrete type.
Taking your example method, I'd rewrite it as follows:
public IList<object> Foo()
{
List<object> retList = new List<object>();
// Blah, blah, [snip]
return retList;
}
The key is that your internal implementation choice - to use a List - isn't revealed to the caller, but you're returning an appropriate interface.
Microsoft's own guidelines on framework development recommend against returning specific types, favoring interfaces. (Sorry, I couldn't find a link for this)
Similarly, your parameters should be as general as possible - instead of accepting an array, accept an IEnumerable of the appropriate type. This is compatible with arrays as well as lists and other useful types.
Taking your example method again:
public IList<object> Foo(IEnumerable<object> bar)
{
List<object> retList = new List<object>();
// Blah, blah, [snip]
return retList;
}
If the collection that is being returned is read-only, meaning you never want the elements to in the collection to be changed, then use IEnumerable<T>. This is the most basic representation of a read-only sequence of immutable (at least from the perspective of the enumeration itself) elements.
If you want it to be a self-contained collection that can be changed, then use ICollection<T> or IList<T>.
For example, if you wanted to return the results of searching for a particular set of files, then return IEnumerable<FileInfo>.
However, if you wanted to expose the files in a directory, however, you would expose IList/ICollection<FileInfo> as it makes sense that you would want to possibly change the contents of the collection.
return ICollection<type>
The advantage to generic return types, is that you can change the underlying implementation without changing the code that uses it. The advantage to returning the specific type, is you can use more type specific methods.
Always return an interface type that presents the greatest amount of functionality to the caller. So in your case ICollection<YourType> ought to be used.
Something interesting to note is that the BCL developers actually got this wrong in some place of the .NET framework - see this Eric Lippert blog post for that story.
Why not IList<MyType>?
It supports direct indexing which is hallmark for an array without removing the possibility to return a List<MyType> some day. If you want to suppress this feature, you probably want to return IEnumerable<MyType>.
It depends on what you plan to do with the collection you're returning. If you're just iterating, or if you only want the user to iterate, then I agree with #Daniel, return IEnumerable<T>. If you actually want to allow list-based operations, however, I'd return IList<T>.
Use generics. It's easier to interoperate with other collections classes and the type system is more able to help you with potential errors.
The old style of returning an array was a crutch before generics.
What ever makes your code more readable, maintainable and easier for YOU.
I would have used the simple array, simpler==better most of the time.
Although I really have to see the context to give the right answer.
There are big advantages to favouring IEnumerable over anything else, as this gives you the greatest implementation flexibility and allows you to use yield return or Linq operators for lazy implementation.
If the caller wants a List<T> instead they can simply call ToList() on whatever you returned, and the overall performance will be roughly the same as if you had created and returned a new List<T> from your method.
Array is harmful, but ICollection<T> is also harmful.
ICollection<T> cannot guarantee the object will be immutable.
My recommendation is to wrap the returning object with ReadOnlyCollection<T>
I'm just wondering how other developers tackle this issue of getting 2 or 3 answers from a method.
1) return a object[]
2) return a custom class
3) use an out or ref keyword on multiple variables
4) write or borrow (F#) a simple Tuple<> generic class
http://slideguitarist.blogspot.com/2008/02/whats-f-tuple.html
I'm working on some code now that does data refreshes. From the method that does the refresh I would like to pass back (1) Refresh Start Time and (2) Refresh End Time.
At a later date I may want to pass back a third value.
Thoughts? Any good practices from open source .NET projects on this topic?
It entirely depends on what the results are. If they are related to one another, I'd usually create a custom class.
If they're not really related, I'd either use an out parameter or split the method up. If a method wants to return three unrelated items, it's probably doing too much. The exception to this is when you're talking across a web-service boundary or something else where a "purer" API may be too chatty.
For two, usually 4)
More than that, 2)
Your question points to the possibility that you'll be returning more data in the future, so I would recommend implementing your own class to contain the data.
What this means is that your method signature will remain the same even if the inner representation of the object you're passing around changes to accommodate more data. It's also good practice for readability and encapsulation reasons.
Code Architeture wise i'd always go with a Custom Class when needing somewhat a specific amount of variables changed. Why? Simply because a Class is actually a "blueprint" of an often used data type, creating your own data type, which it in this case is, will help you getting a good structure and helping others programme for your interface.
Personally, I hate out/ref params, so I'd rather not use that approach. Also, most of the time, if you need to return more than one result, you are probably doing something wrong.
If it really is unavoidable, you will probably be happiest in the long run writing a custom class. Returning an array is tempting as it is easy and effective in the short teerm, but using a class gives you the option of changing the return type in the future without having to worry to much about causing problems down stream. Imagine the potential for a debugging nightmare if someone swaps the order of two elements in the array that is returned....
I use out if it's only 1 or 2 additional variables (for example, a function returns a bool that is the actual important result, but also a long as an out parameter to return how long the function ran, for logging purposes).
For anything more complicated, i usually create a custom struct/class.
I think the most common way a C# programmer would do this would be to wrap the items you want to return in a separate class. This would provide you with the most flexibility going forward, IMHO.
It depends. For an internal only API, I'll usually choose the easiest option. Generally that's out.
For a public API, a custom class usually makes more sense - but if it's something fairly primitive, or the natural result of the function is a boolean (like *.TryParse) I'll stick with an out param. You can do a custom class with an implicit cast to bool as well, but that's usually just weird.
For your particular situation, a simple immutable DateRange class seems most appropriate to me. You can easily add that new value without disturbing existing users.
If you're wanting to send back the refresh start and end times, that suggests a possible class or struct, perhaps called DataRefreshResults. If your possible third value is also related to the refresh, then it could be added. Remember, a struct is always passed by value, so it's allocated on the heap does not need to be garbage-collected.
Some people use KeyValuePair for two values. It's not great though because it just labels the two things as Key and Value. Not very descriptive. Also it would seriously benefit from having this added:
public static class KeyValuePair
{
public static KeyValuePair<K, V> Make(K k, V v)
{
return new KeyValuePair<K, V>(k, v);
}
}
Saves you from having to specify the types when you create one. Generic methods can infer types, generic class constructors can't.
For your scenario you may want to define generic Range{T} class (with checks for the range validity).
If method is private, then I usually use tuples from my helper library. Public or protected methods generally always deserve separate.
Return a custom type, but don't use a class, use a struct - no memory allocation/garbage collection overhead means no downsides.
If 2, a Pair.
If more than 2 a class.
Another solution is to return a dictionary of named object references. To me, this is pretty equivalent to using a custom return class, but without the clutter. (And using RTTI and reflection it is just as typesafe as any other solution, albeit dynamically so.)
It depends on the type and meaning of the results, as well as whether the method is private or not.
For private methods, I usually just use a Tuple, from my class library.
For public/protected/internal methods (ie. not private), I use either out parameter or a custom class.
For instance, if I'm implementing the TryXYZ pattern, where you have an XYZ method that throws an exception on failure and a TryXYZ method that returns Boolean, TryXYZ will use an out parameter.
If the results are sequence-oriented (ie. return 3 customers that should be processed) then I will typically return some kind of collection.
Other than that I usually just use a custom class.
If a method outputs two to three related value, I would group them in a type. If the values are unrelated, the method is most likely doing way too much and I would refactor it into a number of simpler methods.