Unexpected behavior using Enumerable.Empty<string>() - c#

I would expect Enumerable.Empty<string>() to return an empty array of strings. Instead, it appears to return an array with a single null value. This breaks other LINQ operators like DefaultIfEmpty, since the enumerable is not, in fact, empty. This doesn't seem to be documented anywhere, so I'm wondering if I'm missing something (99% probability).
GameObject Class
public GameObject(string id,IEnumerable<string> keywords) {
if (String.IsNullOrWhiteSpace(id)) {
throw new ArgumentException("invalid", "id");
}
if (keywords==null) {
throw new ArgumentException("invalid", "keywords");
}
if (keywords.DefaultIfEmpty() == null) { //This line doesn't work correctly.
throw new ArgumentException("invalid", "keywords");
}
if (keywords.Any(kw => String.IsNullOrWhiteSpace(kw))) {
throw new ArgumentException("invalid", "keywords");
}
_id = id;
_keywords = new HashSet<string>(keywords);
}
Test
[TestMethod]
[ExpectedException(typeof(ArgumentException))]
public void EmptyKeywords() {
GameObject test = new GameObject("test",System.Linq.Enumerable.Empty<string>());
}

It looks like you expect this condition:
keywords.DefaultIfEmpty() == null
to evaluate to true. However DefaultIfEmpty returns a singleton sequence containing the default for the element type (string in this case) if the source sequence is empty. Therefore it will return a sequence containing null. This is not itself null however so the condition returns false.

You are misinterpreting the implementation of DefaultIfEmpty, here is it's implementation from the reference source.
public static IEnumerable<TSource> DefaultIfEmpty<TSource>(this IEnumerable<TSource> source) {
return DefaultIfEmpty(source, default(TSource));
}
public static IEnumerable<TSource> DefaultIfEmpty<TSource>(this IEnumerable<TSource> source, TSource defaultValue) {
if (source == null) throw Error.ArgumentNull("source");
return DefaultIfEmptyIterator<TSource>(source, defaultValue);
}
static IEnumerable<TSource> DefaultIfEmptyIterator<TSource>(IEnumerable<TSource> source, TSource defaultValue) {
using (IEnumerator<TSource> e = source.GetEnumerator()) {
if (e.MoveNext()) {
do {
yield return e.Current;
} while (e.MoveNext());
}
else {
yield return defaultValue;
}
}
}
So what it does is if a IEnumerable<T> is not empty it simply returns the IEnumerable<T>, if the IEnumerable<T> is empty it returns new a IEnumerable<T> with one object in it with the value default(T). It will never return null which is what your test is testing for. If you wanted to test this you would need to do
if(keywords.DefaultIfEmpty().First() == null)
However this is going to cause the IEnumerable<string> to be evaluated multiple times. I would drop the LINQ and just do like the LINQ method does and do it the long way (this also gets rid of the extra evaluation you had inside new HashSet<string>(keywords)).
public GameObject(string id,IEnumerable<string> keywords)
{
if (String.IsNullOrWhiteSpace(id)) {
throw new ArgumentException("invalid", "id");
}
if (keywords==null) {
throw new ArgumentException("invalid", "keywords");
}
_keywords = new HashSet<string>();
using (var enumerator = keywords.GetEnumerator())
{
if (e.MoveNext())
{
do
{
if(e.Current == null)
throw new ArgumentException("invalid", "keywords");
_keywords.Add(e.Current);
} while (e.MoveNext());
}
else
{
throw new ArgumentException("invalid", "keywords");
}
}
_id = id;
}
This makes it so you only loop once over the IEnumerable<string>.

Does this solve your problem?
public GameObject(string id, IEnumerable<string> keywords) {
if (String.IsNullOrWhiteSpace(id)) {
throw new ArgumentException("invalid", "id");
}
if (keywords == null || !keywords.Any()
|| keywords.Any(k => String.IsNullOrWhiteSpace(k))) {
throw new ArgumentException("invalid", "keywords");
}
_id = id;
_keywords = new HashSet<string>(keywords);
}
*Improved the code with suggestions from #ScottChamberlain & #ginkner

Related

Calling method with IEnumerable<T> sequence as argument, if that sequence is not empty

I have method Foo, which do some CPU intensive computations and returns IEnumerable<T> sequence. I need to check, if that sequence is empty. And if not, call method Bar with that sequence as argument.
I thought about three approaches...
Check, if sequence is empty with Any(). This is ok, if sequence is really empty, which will be case most of the times. But it will have horrible performance, if sequence will contains some elements and Foo will need them compute again...
Convert sequence to list, check if that list it empty... and pass it to Bar. This have also limitation. Bar will need only first x items, so Foo will be doing unnecessary work...
Check, if sequence is empty without actually reset the sequence. This sounds like win-win, but I can't find any easy build-in way, how to do it. So I create this obscure workaround and wondering, whether this is really a best approach.
Condition
var source = Foo();
if (!IsEmpty(ref source))
Bar(source);
with IsEmpty implemented as
bool IsEmpty<T>(ref IEnumerable<T> source)
{
var enumerator = source.GetEnumerator();
if (enumerator.MoveNext())
{
source = CreateIEnumerable(enumerator);
return false;
}
return true;
IEnumerable<T> CreateIEnumerable(IEnumerator<T> usedEnumerator)
{
yield return usedEnumerator.Current;
while (usedEnumerator.MoveNext())
{
yield return usedEnumerator.Current;
}
}
}
Also note, that calling Bar with empty sequence is not option...
EDIT:
After some consideration, best answer for my case is from Olivier Jacot-Descombes - avoid that scenario completely. Accepted solution answers this question - if it is really no other way.
I don't know whether your algorithm in Foo allows to determine if the enumeration will be empty without doing the calculations. But if this is the case, return null if the sequence would be empty:
public IEnumerable<T> Foo()
{
if (<check if sequence will be empty>) {
return null;
}
return GetSequence();
}
private IEnumerable<T> GetSequence()
{
...
yield return item;
...
}
Note that if a method uses yield return, it cannot use a simple return to return null. Therefore a second method is needed.
var sequence = Foo();
if (sequence != null) {
Bar(sequence);
}
After reading one of your comments
Foo need to initialize some resources, parse XML file and fill some HashSets, which will be used to filter (yield) returned data.
I suggest another approach. The time consuming part seems to be the initialization. To be able to separate it from the iteration, create a foo calculator class. Something like:
public class FooCalculator<T>
{
private bool _isInitialized;
private string _file;
public FooCalculator(string file)
{
_file = file;
}
private EnsureInitialized()
{
if (_isInitialized) return;
// Parse XML.
// Fill some HashSets.
_isInitialized = true;
}
public IEnumerable<T> Result
{
get {
EnsureInitialized();
...
yield return ...;
...
}
}
}
This ensures that the costly initialization stuff is executed only once. Now you can safely use Any().
Other optimizations are conceivable. The Result property could remember the position of the first returned element, so that if it is called again, it could skip to it immediately.
You would like to call some function Bar<T>(IEnumerable<T> source) if and only if the enumerable source contains at least one element, but you're running into two problems:
There is no method T Peek() in IEnumerable<T> so you would need to actually begin to evaluate the enumerable to see if it's nonempty, but...
You don't want to even partially double-evaluate the enumerable since setting up the enumerable might be expensive.
In that case your approach looks reasonable. You do, however, have some issues with your imlementation:
You need to dispose enumerator after using it.
As pointed out by Ivan Stoev in comments, if the Bar() method attempts to evaluate the IEnumerable<T> more than once (e.g. by calling Any() then foreach (...)) then the results will be undefined because usedEnumerator will have been exhausted by the first enumeration.
To resolve these issues, I'd suggest modifying your API a little and create an extension method IfNonEmpty<T>(this IEnumerable<T> source, Action<IEnumerable<T>> func) that calls a specified method only if the sequence is nonempty, as shown below:
public static partial class EnumerableExtensions
{
public static bool IfNonEmpty<T>(this IEnumerable<T> source, Action<IEnumerable<T>> func)
{
if (source == null|| func == null)
throw new ArgumentNullException();
using (var enumerator = source.GetEnumerator())
{
if (!enumerator.MoveNext())
return false;
func(new UsedEnumerator<T>(enumerator));
return true;
}
}
class UsedEnumerator<T> : IEnumerable<T>
{
IEnumerator<T> usedEnumerator;
public UsedEnumerator(IEnumerator<T> usedEnumerator)
{
if (usedEnumerator == null)
throw new ArgumentNullException();
this.usedEnumerator = usedEnumerator;
}
public IEnumerator<T> GetEnumerator()
{
var localEnumerator = System.Threading.Interlocked.Exchange(ref usedEnumerator, null);
if (localEnumerator == null)
// An attempt has been made to enumerate usedEnumerator more than once;
// throw an exception since this is not allowed.
throw new InvalidOperationException();
yield return localEnumerator.Current;
while (localEnumerator.MoveNext())
{
yield return localEnumerator.Current;
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
}
Demo fiddle with unit tests here.
If you can change Bar then how about change it to TryBar that returns false when IEnumerable<T> was empty?
bool TryBar(IEnumerable<Foo> source)
{
var count = 0;
foreach (var x in source)
{
count++;
}
return count > 0;
}
If that doesn't work for you could always create your own IEnumerable<T> wrapper that caches values after they have been iterated once.
One improvement for your IsEmpty would be to check if source is ICollection<T>, and if it is, check .Count (also, dispose the enumerator):
bool IsEmpty<T>(ref IEnumerable<T> source)
{
if (source is ICollection<T> collection)
{
return collection.Count == 0;
}
var enumerator = source.GetEnumerator();
if (enumerator.MoveNext())
{
source = CreateIEnumerable(enumerator);
return false;
}
enumerator.Dispose();
return true;
IEnumerable<T> CreateIEnumerable(IEnumerator<T> usedEnumerator)
{
yield return usedEnumerator.Current;
while (usedEnumerator.MoveNext())
{
yield return usedEnumerator.Current;
}
usedEnumerator.Dispose();
}
}
This will work for arrays and lists.
I would, however, rework IsEmpty to return:
IEnumerable<T> NotEmpty<T>(IEnumerable<T> source)
{
if (source is ICollection<T> collection)
{
if (collection.Count == 0)
{
return null;
}
return source;
}
var enumerator = source.GetEnumerator();
if (enumerator.MoveNext())
{
return CreateIEnumerable(enumerator);
}
enumerator.Dispose();
return null;
IEnumerable<T> CreateIEnumerable(IEnumerator<T> usedEnumerator)
{
yield return usedEnumerator.Current;
while (usedEnumerator.MoveNext())
{
yield return usedEnumerator.Current;
}
usedEnumerator.Dispose();
}
}
Now, you would check if it returned null.
The accepted answer is probably the best approach but, based on, and I quote:
Convert sequence to list, check if that list it empty... and pass it to Bar. This have also limitation. Bar will need only first x items, so Foo will be doing unnecessary work...
Another take would be creating an IEnumerable<T> that partially caches the underlying enumeration. Something along the following lines:
interface IDisposableEnumerable<T>
:IEnumerable<T>, IDisposable
{
}
static class PartiallyCachedEnumerable
{
public static IDisposableEnumerable<T> Create<T>(
IEnumerable<T> source,
int cachedCount)
{
if (source == null)
throw new NullReferenceException(
nameof(source));
if (cachedCount < 1)
throw new ArgumentOutOfRangeException(
nameof(cachedCount));
return new partiallyCachedEnumerable<T>(
source, cachedCount);
}
private class partiallyCachedEnumerable<T>
: IDisposableEnumerable<T>
{
private readonly IEnumerator<T> enumerator;
private bool disposed;
private readonly List<T> cache;
private readonly bool hasMoreItems;
public partiallyCachedEnumerable(
IEnumerable<T> source,
int cachedCount)
{
Debug.Assert(source != null);
Debug.Assert(cachedCount > 0);
enumerator = source.GetEnumerator();
cache = new List<T>(cachedCount);
var count = 0;
while (enumerator.MoveNext() &&
count < cachedCount)
{
cache.Add(enumerator.Current);
count += 1;
}
hasMoreItems = !(count < cachedCount);
}
public void Dispose()
{
if (disposed)
return;
enumerator.Dispose();
disposed = true;
}
public IEnumerator<T> GetEnumerator()
{
foreach (var t in cache)
yield return t;
if (disposed)
yield break;
while (enumerator.MoveNext())
{
yield return enumerator.Current;
cache.Add(enumerator.Current)
}
Dispose();
}
IEnumerator IEnumerable.GetEnumerator()
=> GetEnumerator();
}
}

How to create a <T> function for list comparison?

public static bool CompareLists(List<Product> lstProduct1, List<Product> lstProduct2, List<DuplicateExpression> DuplicateExpression)
{
string[] Fields = DuplicateExpression.Select(x => x.ExpressionName).ToArray();
//var JoinExp = lstProduct1.Join(lstProduct2, new[] { "ProductName", "ProductCode" });
var JoinExp = lstProduct1.Join(lstProduct2, Fields);
bool IsSuccess = CompareTwoLists(lstProduct1, lstProduct2, (listProductx1, listProductx2) => JoinExp.Any());
return IsSuccess;
}
How to convert above function as <T> function?. Actually this is a List comparison function.
SequenceEqual solves your problem.
new[] { "A", "B" }.SequenceEqual(new[] { "A", "B" }).Should().BeTrue();
Here is the source code.
public static bool SequenceEqual<TSource>(this IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
if (comparer == null) comparer = EqualityComparer<TSource>.Default;
if (first == null) throw Error.ArgumentNull("first");
if (second == null) throw Error.ArgumentNull("second");
using (IEnumerator<TSource> e1 = first.GetEnumerator())
using (IEnumerator<TSource> e2 = second.GetEnumerator())
{
while (e1.MoveNext())
{
if (!(e2.MoveNext() && comparer.Equals(e1.Current, e2.Current)))
return false;
}
if (e2.MoveNext())
return false;
}
return true;
}
In your case you could elect to replace IEnumerable<TSource> with IList<TSource> or even List<TSource> ideally the highest level of abstraction is preferred.

Method should throw exception but it doesn't

I wrote a small extensionmethod which finds the indexes of the given string in any IEnumerable.
public static IEnumerable<int> FindIndexesOf(this IEnumerable<string> itemList, string indexesToFind)
{
if (itemList == null)
throw new ArgumentNullException("itemList");
if (indexesToFind == null)
throw new ArgumentNullException("indexToFind");
List<string> enumerable = itemList as List<string> ?? itemList.ToList();
for (int i = 0; i < enumerable.Count(); i++)
{
if (enumerable[i] == indexesToFind)
yield return i;
}
}
As you can see above, an ArgumentNullException is thrown if itemList is null. Plain and simple.
When running my unittest on the above method, I expect and exception of type ArgumentNullException, because itemList is null. However, the test comes out false because no exception gets thrown.
How is that possible? The logic seems quite clear. See the test below.
[TestMethod]
[ExpectedException(typeof(ArgumentNullException))]
public void FindIndexesOfTest2()
{
string[] items = null;
IEnumerable<int> indexes = items.FindIndexesOf("one");
}
Where am I going wrong in my logic; why is it not throwing an ArgumentNullException?
The problem is that enumerators using yield is lazily evaluated.
Since you're not iterating over the collection returned, the method hasn't actually executed.
The correct way to do this is to split the method in two:
public static IEnumerable<int> FindIndexesOf(this IEnumerable<string> itemList, string indexesToFind)
{
if (itemList == null)
throw new ArgumentNullException("itemList");
if (indexesToFind == null)
throw new ArgumentNullException("indexToFind");
return FindIndexesOfImpl(itemList, indexesToFind);
}
private static IEnumerable<int> FindIndexesOfImpl(this IEnumerable<string> itemList, string indexesToFind)
{
List<string> enumerable = itemList as List<string> ?? itemList.ToList();
for (int i = 0; i < enumerable.Count(); i++)
{
if (enumerable[i] == indexesToFind)
yield return i;
}
}
Here the first method will execute when you call it, and return a lazily evaluated enumerator that hasn't, until you iterate over it.
Though, I would suggest you also change the latter method here to be truly lazily evaluated. The fact that the method caches the entire itemList just to be able to use indexes is unnecessary, and you can in fact rewrite it without it:
public static IEnumerable<int> FindIndexesOfImpl(this IEnumerable<string> itemList, string indexesToFind)
{
var index = 0;
foreach (var item in itemList)
{
if (item == indexesToFind)
yield return index;
index++;
}
}
You can also use the LINQ extension methods to do it though this involves constructing a temporary object for each element, unsure whether it is worth it, I'd go with the one just above here instead:
public static IEnumerable<int> FindIndexesOfImpl(this IEnumerable<string> itemList, string indexesToFind)
{
return itemList
.Select((item, index) => new { item, index })
.Where(element => element.item == indexesToFind)
.Select(element => element.index);
}
With this last method you can move this back up into the main method because you're no longer using yield:
public static IEnumerable<int> FindIndexesOf(this IEnumerable<string> itemList, string indexesToFind)
{
if (itemList == null)
throw new ArgumentNullException("itemList");
if (indexesToFind == null)
throw new ArgumentNullException("indexToFind");
return itemList
.Select((item, index) => new { item, index })
.Where(element => element.item == indexesToFind)
.Select(element => element.index);
}

string s[s.Length-1] vs s.Last()

Q1) I wonder if calling s.Last() linq extension method is as efficient as doing s[s.Length-1]. I prefer the first option but I don't know if the implementation takes advantage of the current type.
Q2) This could be another interesting question. Does linq extension methods takes advantage of the type when they are used or they just see the object as an IEnumerable?
No, it will not be as efficient as directly indexing, which is O(1). We can see in the reference source for Enumerable.Last:
public static TSource Last<TSource>(this IEnumerable<TSource> source) {
if (source == null) throw Error.ArgumentNull("source");
IList<TSource> list = source as IList<TSource>;
if (list != null) {
int count = list.Count;
if (count > 0) return list[count - 1];
}
else {
using (IEnumerator<TSource> e = source.GetEnumerator()) {
if (e.MoveNext()) {
TSource result;
do {
result = e.Current;
} while (e.MoveNext());
return result;
}
}
}
throw Error.NoElements();
}
Since String does not implement IList<char> it will go to the branch that uses the enumerator requiring all characters to be checked until the last one is found (which is O(n)).
As you can see, in some cases, LINQ methods take into account more efficient ways to access data provided by various interfaces. Other examples, include First, Count, and ElementAt.
It is not as efficient, it has a special case if you call it on something that implements an IList but not for string. Here is the implementation from Reflector.
[__DynamicallyInvokable]
public static TSource Last<TSource>(this IEnumerable<TSource> source)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
IList<TSource> list = source as IList<TSource>;
if (list != null)
{
int count = list.Count;
if (count > 0)
{
return list[count - 1];
}
}
else
{
using (IEnumerator<TSource> enumerator = source.GetEnumerator())
{
if (enumerator.MoveNext())
{
TSource current;
do
{
current = enumerator.Current;
}
while (enumerator.MoveNext());
return current;
}
}
}
throw Error.NoElements();
}
You can see that it enumerates through the whole sequence and then just returns the last element.
If you're so concerned about performance of string.Last() then you can get the best of both worlds by implementing your own overload of Last(). If your overload is a better match then Enumerable.Last() then yours will be used.
internal class Program
{
private static void Main()
{
Console.WriteLine("Hello".Last());
}
}
public static class StringExtensions
{
public static char Last(this string text)
{
if (text == null)
{
throw new ArgumentNullException("text");
}
int length = text.Length;
if (length == 0)
{
throw new ArgumentException("Argument cannot be empty.", "text");
}
return text[length - 1];
}
}
If you want to risk it and take out the argument checks, you can do that too, but I wouldn't.
I tested to confirm StringExtensions.Last() is being called even though I use this technique often enough to know for sure it works. :-)
Note: In order for your overload to be called the variable must be declared as a string so the compiler knows it's a string. If it's an IEnumerable<char> that happens to be a string at runtime, the more efficient method will not be called, example:
private static void Main()
{
IEnumerable<char> s = "Hello";
Console.WriteLine(s.Last());
}
Here StringExtensions.Last() is not called because the compiler doesn't know s is a string, it only knows it's IEnumerable<char> (remember member overload resolution is decided at compile time). For strings this is not much of a concern, but for other optimizations it can be.

The best way to throw an exception

Do you know a better way (more pretty) than below to throw an exception?
public long GetPlaylistId(long songInPlaylistId)
{
var songInPlaylist = service.GetById(songInPlaylistId);
return songInPlaylist
.With(x => x.Playlist)
.ReturnValueOrException(x => x.Id,
new ArgumentException(
"Bad argument 'songInPlaylistId'"));
}
Monadic extension methods:
public static TResult With<TInput, TResult>(this TInput obj,
Func<TInput, TResult> evaluator)
where TInput : class
where TResult : class
{
return obj == null ? null : evaluator(obj);
}
public static TResult ReturnValueOrException<TInput, TResult>(
this TInput obj, Func<TInput, TResult> evaluator, Exception exception)
where TInput : class
{
if (obj != null)
{
return evaluator(obj);
}
throw exception;
}
If it is valid to try to get the playlist for something that doesn't have a playlist, then you should not throw an exception but should just return a special value that means "not found" instead (for example, 0 or -1 depending on how your playlist IDs work).
Alternatively you could write a TryGetPlaylistId() method which works in a similar way to Microsoft's TryXXX() methods (e.g. SortedList.TryGetValue()), for example:
public bool TryGetPlaylistId(long songInPlaylistId, out long result)
{
result = 0;
var songInPlaylist = service.GetById(songInPlaylistId);
if (songInPlaylist == null)
return false;
if (songInPlaylist.Playlist == null)
return false;
result = songInPlaylist.Playlist.Id;
return true;
}
A small problem with this approach is that you are obscuring information that might be of use when trying to diagnose issues. Perhaps adding Debug.WriteLine() or some other form of logging would be of use. The point being, you can't differentiate between the case where the playlist ID is not found, and the case where it is found but doesn't contain a playlist.
Otherwise, you could throw an exception which has a more informative message, for example:
public long GetPlaylistId(long songInPlaylistId)
{
var songInPlaylist = service.GetById(songInPlaylistId);
if (songInPlaylist == null)
throw new InvalidOperationException("songInPlaylistId not found: " + songInPlaylistId);
if (songInPlaylist.Playlist == null)
throw new InvalidOperationException("Playlist for ID " + songInPlaylistId " has no playlist: ");
return songInPlaylist.Playlist.Id;
}
It might be the case that it is valid to not find the song in the playlist, but it is NOT valid to find one which does not have a playlist, in which case you would return a special value in the first case and throw an exception in the second case, for example:
public long GetPlaylistId(long songInPlaylistId)
{
var songInPlaylist = service.GetById(songInPlaylistId);
if (songInPlaylist == null)
return -1; // -1 means "playlist not found".
if (songInPlaylist.Playlist == null)
throw new InvalidOperationException("Playlist for ID " + songInPlaylistId " has no playlist: ");
return songInPlaylist.Playlist.Id;
}
In any case, I personally think that your extension methods are just obscuring the code.
try{
if (obj != null)
{
return evaluator(obj);
}
}
catch(Exception ex)
{
throw;
}
return obj;
You should not throw error unless caught in to some. Better return null in the given case and handle it in your calling code:
And what will happen if I have more than one such ambiguous methods in my class? It's very difficult to invent different rules for any method. You will be confused in the end.
What do you think about this solution?
public class ApplicationResponse
{
public IList<string> Errors { get; set; }
public dynamic Data { get; set; }
public bool HasErrors()
{
return Errors != null && Errors.Any();
}
}
public ApplicationResponse GetPlaylistId(long songInPlaylistId)
{
var songInPlaylist = service.GetById(songInPlaylistId);
if (songInPlaylist == null)
{
return new ApplicationResponse { Errors = new[] { "Song was not found." } };
}
if (songInPlaylist.Playlist == null)
{
return new ApplicationResponse { Errors = new[] { "Playlist was not found." } };
}
return new ApplicationResponse { Data = songInPlaylist.Playlist.Id };
}
public HttpResponseMessage SomeRequest([FromUri] songInPlaylistId)
{
var response = appService.GetPlaylistId(long songInPlaylistId);
if (response.HasErrors())
{
// reply with error status
}
// reply with ok status
}
In such case I can send all the errors to a client.

Categories

Resources