Most efficient way to determine if a list is empty - c#

I have a function which will be called thousands of times per day, and I want to optimize this function so it will be as fast and efficient as possible.
In this function, a list will be checked and based on the result of this check, different actions will happen. My question is what is the most efficient way of determining how many elements are in this list.
Obviously, you can just check like this:
List<Objects> data = GetData();
if (data.Count == 0)
{
//Do something
}
else if (data.Count < 5)
{
//Do something with a small list
}
else
{
//Do something with a larger list
}
Is this already the fastest/most efficient way of doing this?
I came up with an alternative, but I would like some suggestions
List<Objects> data = GetData();
int amountOfObjects = data.Count();
if (amountOfObjects == 0)
{
//Do something
}
else if (amountOfObjects < 5)
{
//Do something with a small list
}
else
{
//Do something with a larger list
}

You should use the property Count as it is a pre-calculated value and will not require recalculating the value when you use it, whereas the method Count() will try to be a smart ass and try to work out if it needs to recount or not, but that working out alone is still more effort than just using Count.
So just use what you have initially done.

For List<T>, the Count property really just returns a field because the implementation is an array list that needs to know very precisely how many elements are in this collection. Therefore, you won't gain any performance by trying to cache this value or anything alike. This is just no problem.
This situation may be different when you use other collection implementations. For example, a LinkedList conceptually has no clue of how many elements are in it, but has to count them, which is an expensive operation.
Edit: Your alternative using Count() is actually a very bad thing. Since List<T> is sealed, the compiler will create a static method call for accessing the Count property meanwhile Count() results in a cast and a virtual method call over an interface. That makes up much more cost and the JIT-compiler can do less magic such as inlining.

Related

Is it safe to use IEnumerable for something that may be enumerated multiple times?

For example let's say I have the following:
public class StringTester
{
IEnumerable<IStringConditional> Conditionals { get; }
public StringTester(IEnumerable<IStringConditional> conditionals)
{
conditionals = Conditionals;
}
public bool TestString(string testString)
{
foreach (var conditional in Conditionals)
{
if (!conditional.Test(testString))
{
return false;
}
}
return true;
}
}
Is this regarded as safe, or are IEnumerable types only safe for a single enumeration? If not what type would be best to use in a case like this?
It is undefined as to whether IEnumerable<T> is repeatable in the general case. Often (usually): it is. However, examples exist that either:
can only ever be iterated once (think "data from a socket")
can be iterated many times, but give completely different data each time (without any corresponding obvious state mutation)
So: in general I wouldn't advise ever iterating it more than once unless you know the scenario is a repeatable one. Things like arrays and lists are repeatable (if the data changes, of course, they may change to show those changes).
It should be safe but it might not be optimal. The devil is in the implementation.
IEnumerable calls GetEnumerator() which returns an IEnumerator that implements a Reset() function.
How this function is implemented is what might cause some problem. If you're enumerating an array calling Reset() simply sets its internal pointer to -1 and it's ready to enumerate again.
If you're enumerating something connected over a network you might have to redo that whole connection process and query the data again which will be longer and sub-optimal when used more than once.

Any benefit of using yield in this case?

I am maintaining some code at work and the original author is gone so thought I would ask here to see if I can satisfy my curiosity.
Below is a bit of code (anonymized) where yield is being used. As far as I can tell it does not add any benefit and just returning a list would be sufficient, maybe more readable as well (for me at least). Just wondering if I am missing something because this pattern is repeated in a couple of places in the code base.
public virtual IEnumerable<string> ValidProductTypes
{
get
{
yield return ProductTypes.Type1;
yield return ProductTypes.Type2;
yield return ProductTypes.Type3;
}
}
This property is used as a parameter for some class which just uses it to populate a collection:
var productManager = new ProductManager(ValidProductTypes);
public ProductManager(IEnumerable<string> validProductTypes)
{
var myFilteredList = GetFilteredTypes(validProductTypes);
}
public ObservableCollection<ValidType> GetFilteredTypes(IEnumerable<string> validProductTypes)
{
var filteredList = validProductTypes
.Where(type => TypeIsValid); //TypeIsValid returns a ValidType
return new ObservableCollection<ValidType>(filteredList);
}
I'd say that returning an IEnumerable<T> and implementing that using yield return is the simplest option.
If you see that a method returns an IEnumerable<T>, there really is only one thing you can do with it: iterate it. Any more complicated operations on it (like using LINQ) are just encapsulated specific ways of iterating it.
If a method returns an array or list, you also gain the ability to mutate it and you might start wondering if that's an acceptable use of the API. For example, what happens if you do ValidProductTypes.Add("a new product")?
If you're talking just about the implementation, then the difference becomes much smaller. But the caller would still be able to cast the returned array or list from IEnumerable<T> to its concrete type and mutate that. The chance that anyone would actually think this was the intended use of the API is small, but with yield return, the chance is zero, because it's not possible.
Considering that I'd say the syntax has roughly the same complexity and ease of understanding, I think yield return is a reasonable choice. Though with C# 6.0 expression bodied properties, the syntax for arrays might get the upper hand:
public virtual IEnumerable<string> ValidProductTypes =>
new[] { ProductTypes.Type1, ProductTypes.Type2, ProductTypes.Type3 };
The above answer is assuming that this is not performance-critical code, so fairly small differences in performance won't matter. If this is performance-critical code, then you should measure which option is better. And you might also want to consider getting rid of allocations (probably by caching the array in a field, or something like that), which might be the dominant factor.

C# LazyList for lots of entries

I recently became aware of the notion of LazyList, and I would like to implement this notion in my work.
I have serveral methods which may retrieve hundreds of thousands of entries from the database, I want to return a LazyList<T> rather than a typical List<T>.
I could only find Lazy<List<T>> which is, as to my understanding, not the same. The Lazy<List<T>> makes the initialization of the list lazy, thats not what I need.
I want to give an example from Scheme language, if someone ever used it.
Basically it is implemented by LinkedNodeswheras the value of a given node needs to be calculated and the node.next is actually a function which needed to be calculated to retrieve the value.
I wonder how to actually control lists in size of 400k or so, It sounds like its so expensive to hold a List in the size of couple of MB which, possibly, can get to GBs depends on the db operation the program needs to do.
Im currently using .Net 4.5, C# version is 4
Instead of returning a List<T> or LazyList, why not yield return the results? This is much better than retrieving all rows. It will stream it row by row. Better for memory management.
For example: (PSEUDO)
private IEnumerator<Row> GetRows(SqlConnection connection)
{
var resultSet = connection.ExecuteQuery(.....);
resultSet.Open();
try
{
while(resultSet.FetchNext())
{
// read one row..
yield return row;
}
}
finally
{
resultSet.Close();
}
}
foreach(var row in GetRows(connection))
{
// handle the row.
}
This way each the result set is handled each row.

Basic array Any() vs Length

I have a simple array of objects:
Contact[] contacts = _contactService.GetAllContacts();
I want to test if that method returns any contacts. I really like the LINQ syntax for Any() as it highlights what I am trying to achieve:
if(!contacts.Any()) return;
However, is this slower than just testing the length of the array?
if(contacts.Length == 0) return;
Is there any way I can find out what kind of operation Any() performs in this instance without having to go to here to ask? Something like a Profiler, but for in-memory collections?
There are two Any() methods:
1. An extension method for IEnumerable<T>
2. An extension method for IQueryable<T>
I'm guessing that you're using the extension method for IEnumerable<T>. That one looks like this:
public static bool Any<T>(this IEnumerable<T> enumerable)
{
foreach (var item in enumerable)
{
return true;
}
return false;
}
Basically, using Length == 0 is faster because it doesn't involve creating an iterator for the array.
If you want to check out code that isn't yours (that is, code that has already been compiled), like Any<T>, you can use some kind of disassembler. Jetbrains has one for free - http://www.jetbrains.com/decompiler/
I have to completely disagree with the other answers. It certainly does not iterate over the array. It will be marginally slower, as it needs to create an array iterator object and call MoveNext() once, but that cost should be negligible in most scenarios; if Any() makes the code more readable to you, feel free to use it.
Source: Decompiled Enumerable.Any<TSource> code.
If you have a array the Length is in a property of the array. When calling Any you are iterate the array to find the first element. Setting up the enumerator is probably more expensive then just reading the Length property.
In your very case Length is slightly better:
// Just private field test if it's zero or not
if (contacts.Length == 0)
return;
// Linq overhead could be added: e.g. a for loop
// for (int i = 0; i < contains.Length; ++i)
// return true;
// plus inevitable private field test (i < contains.Length)
if (!contacts.Any())
return;
But the difference seems being negligible.
In general case, however, Any is better, because it stops on the first item found
// Itterates until 1st item is found
if (contains.Any(x => MyCondition(x)))
return;
// Itterates the entire collection
if (contains.Select(x => MyCondition(x)).Count() > 0)
return;
Yes, it is slower because it iterate over the elements.Using Length property is better. But still I don't think there is a significant difference because Any returns true as soon as it finds an item.

GetType() == type performance

I am making a XNA game and I am calling following code 2 to 20 times per update. I tried googling and it seems like this is semi-slow, so I just thought I'd ask if there is any faster way to compare types?
Code:
public Modifier this[Type type]
{
get
{
for (int i = 0; i < this.Count; i++)
{
if (this[i].GetType() == type)
{
return this[i];
}
}
throw new NotImplementedException("Fix this");
}
set
{
for (int i = 0; i < this.Count; i++)
{
if (this[i].GetType() == type)
{
this[i] = value;
}
}
if(System.Diagnostics.Debugger.IsAttached)
System.Diagnostics.Debugger.Break();
}
}
This code is in ModifierCollection class which inherits from a List. Modifier is a part of particle engine. Also, my game isnt in condition where I can actually test this yet so I cant test this, but this should work right?
I read something about RunTimeTypeHandles which should be faster, should I use it?
EDIT: What I am aiming to do with this is that I can do the following:
(particleEffect["NameOfEmitter"].Modifiers[typeof(SomeCoolModifier)] as SomeCoolModifier).Variable = Value;
Basically I just want to change the value of some Modifiers in runtime.
EDIT 2: I just realized that I can just save the reference of Modifier to the class where I am at the moment calling this :P Maybe not as clean code if I have 5-10 modifiers but should remove this problem.
If you don't need any of the extra functionality exposed by Type, and you're only concerned with absolute equality between types--i.e., you don't need to support inheritance--RuntimeTypeHandle is the fastest way to do this comparison.
Really, though, I would question whether this isn't a weakness of your class design. Unless you have a compelling reason to check the type directly, it's probably better to expose some sort of value (probably an enum) on your objects that represents what they are, and do your comparisons against that.
If you want to be really fast and can trust the code that is calling you, change the indexer to just take an int. Then in whatever method (which you didn't show) that callers use to add Types to the list, return back to them the corresponding int. It's a worse API but it means you don't have to do any loops or lookups.
You could store the values in a dictionary indexed by type rather than a list so you wouldn't have to do an O(n) iteration over the list each time.
As noted in the comments, this does depend on the size of n and may be a micro-optimization. I'd recommend profiling your application.

Categories

Resources