what is "if (predicate(item, index))" operation in LINQ? - c#

I am fairly new to LINQ. I am looking at this code, and not sure if I understand this properly. I realize that it is an extension and generic method, but what is predicate(item, index) performing (lets say i pass in an array of ints when calling this method)?
I know that predicate is a delegate, but maybe I just don't know how delegation works, someone has any good example/explanation they'd like to give. Also, what is yield keyword, is it just used in linq stuff?
private static IEnumerable<TSource> WhereImpl<TSource>(
this IEnumerable<TSource> source,
Func<TSource, int, bool> predicate)
{
int index = 0;
foreach (TSource item in source)
{
if (predicate(item, index))
{
yield return item;
}
index++;
}
}
I am trying to follow Reimplementing LINQ to Objects: Part 2 - "Where" from Skeet's blog.

predicate(item, index)
is defined to be of type
Func<TSource, int, bool>
that means a method that has parameters of TSource and int and returns a bool - a predicate.
An example for TSource = string could be (totally made up):
bool IsLengthLargerThan(string s, int length)
{
return s.Length > length;
}
Also, what is yield keyword, is it
just used in linq stuff?
yield is specific to iterator blocks - this has been around before LINQ. It basically works like a state machine - yield return item; will return item to the caller and suspend execution, but once you request the next item, execution will resume on the next line. It's easiest to see how it works if you step through it with a debugger.

First the predicate(item, index) is a delegate that takes in the item in the enumeration and the index of that item in the enumeration. So if you started with an array of integers, the item would be an integer and the index would be its index in the array. This index is the result of the current instance of the enumeration so if you add a Where clause and the original index of an item was 3 and the where filtered out the first 3 then its new index would be 0.
The yield keyword is a C# keyword for easy output of an IEnumerable.

predicate(..) is a function that takes 2 parameters, a string and an int, and return true or false. yield return is a keyword that essentially is like saying "yea, add this to the IEnumerable that I'll be returning, but let's keep looking for others"
So it executes the predicate function with a string and an int. You see index was initialized to 0 and is auto-incremented every time. That's the second param to predicate. When the function returns true, you say "add it to my return collection"

Related

Why there is two completely different version of Reverse for List and IEnumerable?

For the List object, we have a method called Reverse().
It reverse the order of the list 'in place', it doesn't return anything.
For the IEnumerable object, we have an extension method called Reverse().
It returns another IEnumerable.
I need to iterate in reverse order throught a list, so I can't directly use the second method, because I get a List, and I don't want to reverse it, just iterate backwards.
So I can either do this :
for(int i = list.Count - 1; i >=0; i--)
Or
foreach(var item in list.AsEnumerable().Reverse())
I found it less readable than if I have an IEnumerable, just do
foreach(var item in list.Reverse())
I can't understand why this 2 methods have been implemented this way, with the same name. It is pretty annoying and confusing.
Why there is not an extension called BackwardsIterator() in the place of Reverse() working for all IEnumerable?
I'm very interested by the historical reason of this choice, more than the 'how to do it' stuff!
It is worth noting that the list method is a lot older than the extension method. The naming was likely kept the same as Reverse seems more succinct than BackwardsIterator.
If you want to bypass the list version and go to the extension method, you need to treat the list like an IEnumerable<T>:
var numbers = new List<int>();
numbers.Reverse(); // hits list
(numbers as IEnumerable<int>).Reverse(); // hits extension
Or call the extension method as a static method:
Enumerable.Reverse(numbers);
Note that the Enumerable version will need to iterate the underlying enumerable entirely in order to start iterating it in reverse. If you plan on doing this multiple times over the same enumerable, consider permanently reversing the order and iterating it normally.
Write your own BackwardsIterator then!
public static IEnumerable BackwardsIterator(this List lst)
{
for(int i = lst.Count - 1; i >=0; i--)
{
yield return lst[i];
}
}
The existence of List<T>.Reverse long preceded the existence of IEnumerable<T>.Reverse. The reason they are named the same is ... incompetence. It's a horrible botch; clearly the Linq IEnumerable<T> function should have been given a different name ... e.g., Backwards ... since they have quite different semantics. As it is, it lays an awful trap for programmers -- someone might change the type of list from List<T> to, e.g., Collection<T>, and suddenly list.Reverse();, rather than reversing list in place, simply returns an IEnumerable<T> that is discarded. It cannot be overstated just how incompetent it was of MS to give these methods the same name.
To avoid the problem you can define your own extension method
public static IEnumerable<T> Backwards<T>(this IEnumerable<T> source) => source.Reverse();
You can even add a special case for efficient processing of indexable lists:
public static IEnumerable<T> Backwards<T>(this IEnumerable<T> source) =>
source is IList<T> list ? Backwards<T>(list) : source.Reverse();
public static IEnumerable<T> Backwards<T>(this IList<T> list)
{
for (int x = list.Count; --x >= 0;)
yield return list[x];
}

Linq: The "opposite" of Take?

Using Linq; how can I do the "opposite" of Take?
I.e. instead of getting the first n elements such as in
aCollection.Take(n)
I want to get everything but the last n elements. Something like
aCollection.Leave(n)
(Don't ask why :-)
Edit
I suppose I can do it this way aCollection.TakeWhile((x, index) => index < aCollection.Count - n) Or in the form of an extension
public static IEnumerable<TSource> Leave<TSource>(this IEnumerable<TSource> source, int n)
{
return source.TakeWhile((x, index) => index < source.Count() - n);
}
But in the case of Linq to SQL or NHibernate Linq it would have been nice if the generated SQL took care of it and generated something like (for SQL Server/T-SQL)
SELECT TOP(SELECT COUNT(*) -#n FROM ATable) * FROM ATable Or some other more clever SQL implementation.
I suppose there is nothing like it?
(But the edit was actually not part of the question.)
aCollection.Take(aCollection.Count() - n);
EDIT: Just as a piece of interesting information which came up in the comments - you may think that the IEnumerable's extension method .Count() is slow, because it would iterate through all elements. But in case the actual object implements ICollection or ICollection<T>, it will just use the .Count property which should be O(1). So performance will not suffer in that case.
You can see the source code of IEnumerable.Count() at TypeDescriptor.net.
I'm pretty sure there's no built-in method for this, but this can be done easily by chaining Reverse and Skip:
aCollection.Reverse().Skip(n).Reverse()
I don't believe there's a built-in function for this.
aCollection.Take(aCollection.Count - n)
should be suitable; taking the total number of items in the collection minus n should skip the last n elements.
Keeping with the IEnumerable philosphy, and going through the enumeration once for cases where ICollection isn't implemented, you can use these extension methods:
public static IEnumerable<T> Leave<T>(this ICollection<T> src, int drop) => src.Take(src.Count - drop);
public static IEnumerable<T> Leave<T>(this IEnumerable<T> src, int drop) {
IEnumerable<T> IEnumHelper() {
using (var esrc = src.GetEnumerator()) {
var buf = new Queue<T>();
while (drop-- > 0)
if (esrc.MoveNext())
buf.Enqueue(esrc.Current);
else
break;
while (esrc.MoveNext()) {
buf.Enqueue(esrc.Current);
yield return buf.Dequeue();
}
}
}
return (src is ICollection<T> csrc) ? csrc.Leave(drop) : IEnumHelper();
}
This will be much more efficient than the solutions with a double-reverse, since it creates only one list and only enumerates the list once.
public static class Extensions
{
static IEnumerable<T> Leave<T>(this IEnumerable<T> items, int numToSkip)
{
var list = items.ToList();
// Assert numToSkip <= list count.
list.RemoveRange(list.Count - numToSkip, numToSkip);
return List
}
}
string alphabet = "abcdefghijklmnopqrstuvwxyz";
var chars = alphabet.Leave(10); // abcdefghijklmnop
Currently, C# has a TakeLast(n) method defined which takes characters from the end of the string.
See here: https://msdn.microsoft.com/en-us/library/hh212114(v=vs.103).aspx

Returning list from method - shorthand?

Rather than declaring a list at the start of the method, adding to it and then returning it - I'm sure there's some shorthand return statement that can be written in a loop, for example, to save the extra code (declaring etc.) but I've forgot it. Anybody know what I mean?
Use yield:
public IEnumerable<int> BuildList()
{
yield return 1;
yield return 2;
}
I think you are looking for yield return
you can just use it like so to return elements in a loop:
public IEnumerable<T> GetElements()
{
foreach(T t in listOfT)
{
// do some work
yield return t;
//code will continue here on next iteration
}
}
be aware that often you can use linq or the extension methods to so some work on all the elements of a list without having to write a function with a loop. Like filtering the list for elements that satisfy to some condition or to perform an operation on all elements of a list.

Explain Linq Microsoft Select - Indexed [Example]

I'm running throuth Microsoft's 101 LINQ Samples, and I'm stumped on how this query knows how to assign the correct int value to the correct int field:
public void Linq12()
{
int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
var numsInPlace = numbers.Select((num, index) => new { Num = num, InPlace = (num == index) });
Console.WriteLine("Number: In-place?");
foreach (var n in numsInPlace)
{
Console.WriteLine("{0}: {1}", n.Num, n.InPlace);
}
}
I saw in SO #336758 that there have been errors in the examples before, but it is much more likely that I am just missing something.
Could someone explain this and how the compiler knows how to interpret this data correctly?
EDIT:
OK, I think my confusion comes from the LINQ extension that enables the Select feature to work. The Func and two int parameters IEnumerable<TResult> IEnumerable<int>.Select(Func<int,int,TResult> selector) are most likely the key to my lack of understanding.
I'm not really sure what you are asking of but the Select iterates over the list starting at index 0. If the value of the element at the current index is equal to the index it will set the InPlace property in the anonymous object to true. My guess is that the code above prints true for 3, 6 and 7, right?
It would also make it easier to explain if you write what you don't understand.
Jon Skeet has written a series of blog post where he implement linq, read about Select here: Reimplementation of Select
UPDATE: I noticed in one of your comment to one of the other comments and it seems like it is the lambda and not linq itself that is confusing you. If you read Skeet's blog post you see that Select has two overloads:
public static IEnumerable<TResult> Select<TSource, TResult>(
this IEnumerable<TSource> source,
Func<TSource, TResult> selector)
public static IEnumerable<TResult> Select<TSource, TResult>(
this IEnumerable<TSource> source,
Func<TSource, int, TResult> selector)
The Select with index matches the second overload. As you can see it is an extension of IEnumerable<TSource> which in your case is the list of ints and therefor you are calling the Select on an IEnumerable<int> and the signature of Select becomes: Select<int, TResult>(this IEnumerable<int> source, Func<int, int, TResult> selector). As you can see I changed TSource against int, since that is the generic type of your IEnumerable<int>. I still have TResult since you are using anonymous type. So that might explain some parts?
Looks correct to me.
First you have an anonymous type with Num and InPlace being created. Then the LINQ Select is just iterating over the elements with the element and the index of that element. If you were to rewrite it without linq and anonymous classes, it would look like this:
class NumsInPlace
{
public int Num { get; set; }
public bool InPlace { get; set; }
}
int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
List<NumsInPlace> numsInPlace = new List<int>();
for (int index = 0; i < numbers.length; i++)
{
int num = numers[index];
numsInPlace.Add(new NumsInPlace() { Num = num, InPlace = (index == num) });
}
Console.WriteLine("Number: In-place?");
foreach (var n in numsInPlace)
{
Console.WriteLine("{0}: {1}", n.Num, n.InPlace);
}
The MSDN on Enumerable.Select has the details, but the projection function (num, index) always has the item first, then the index second (if supplied).
how does the compiler know that LINQ Select is using the index value as an index to the array?
The compiler doesn't know about index values. The implementation of that overload of Select knows about index values.
//all the compiler sees is a method that accepts 2 int parameters and returns a bool.
Func<int, int, bool> twoIntFunc = (x, y) => (x == y);
//The compiler sees there's an overload of Enumerable.Select which accepts such a method.
IEnumerable<bool> query = numbers.Select(twoIntFunc);
Enumerable.Select's implementation does the rest of the work by calling that method with the appropriate parameters.
The first argument to selector represents the element to process. The second argument to selector represents the zero-based index of that element in the source sequence.
So - Select will call your method with (5, 0) first, then Select calls it with (4, 1).
The lambda expression (num, index) => new { Num = num, InPlace = (num == index) } is executed once for each element in the input sequence and passed the item and it's index as the arguments.
Update
Lambda expressions can be implicitly typed, that is, from the required type, the compiler can figure out (or imply) what types you intend the arguments to be.
(num, index) => new { Num = num, InPlace = (num == index) }
is equivalent to
someAnonymousType MyMethod(int num, int index)
{
return new
{
Num = num,
InPlace = (num == index)
};
}
obviously you can't write the latter because you can't type the name of an anonymous type, but the compiler can ;)
The compiler knows this because the overload of Select that you're using accepts a Func<TSource, Int32, TResult>, this is a Func that takes, two arguments of type TSource (the type of your IEnumberable<T>, in this case int) and an Int32 (which represents the index) and returns an object of TResult, being whatever you choose to return from your function, in this case, an anonymous type.
The lambda can be cast to the required type and therefore it just works.
The second argument in the select is the index, which increments as the compiler traverses the array of numbers. The compiler will see
num index num = index
5 0 false
4 1 false
1 2 false
3 3 true
9 4 false
8 5 false
6 6 true
7 7 true
2 8 false
0 9 false
It provides true for 3,6 and 7. You need to remember it starts with the index at 0.

Getting head and tail from IEnumerable that can only be iterated once

I have a sequence of elements. The sequence can only be iterated once and can be "infinite".
What is the best way get the head and the tail of such a sequence?
Update: A few clarifications that would have been nice if I included in the original question :)
Head is the first element of the sequence and tail is "the rest". That means the the tail is also "infinite".
When I say infinite, I mean "very large" and "I wouldn't want to store it all in memory at once". It could also have been actually infinite, like sensor data for example (but it wasn't in my case).
When I say that it can only be iterated once, I mean that generating the sequence is resource heavy, so I woundn't want to do it again. It could also have been volatile data, again like sensor data, that won't be the same on next read (but it wasn't in my case).
Decomposing IEnumerable<T> into head & tail isn't particularly good for recursive processing (unlike functional lists) because when you use the tail operation recursively, you'll create a number of indirections. However, you can write something like this:
I'm ignoring things like argument checking and exception handling, but it shows the idea...
Tuple<T, IEnumerable<T>> HeadAndTail<T>(IEnumerable<T> source) {
// Get first element of the 'source' (assuming it is there)
var en = source.GetEnumerator();
en.MoveNext();
// Return first element and Enumerable that iterates over the rest
return Tuple.Create(en.Current, EnumerateTail(en));
}
// Turn remaining (unconsumed) elements of enumerator into enumerable
IEnumerable<T> EnumerateTail<T>(IEnumerator en) {
while(en.MoveNext()) yield return en.Current;
}
The HeadAndTail method gets the first element and returns it as the first element of a tuple. The second element of a tuple is IEnumerable<T> that's generated from the remaining elements (by iterating over the rest of the enumerator that we already created).
Obviously, each call to HeadAndTail should enumerate the sequence again (unless there is some sort of caching used). For example, consider the following:
var a = HeadAndTail(sequence);
Console.WriteLine(HeadAndTail(a.Tail).Tail);
//Element #2; enumerator is at least at #2 now.
var b = HeadAndTail(sequence);
Console.WriteLine(b.Tail);
//Element #1; there is no way to get #1 unless we enumerate the sequence again.
For the same reason, HeadAndTail could not be implemented as separate Head and Tail methods (unless you want even the first call to Tail to enumerate the sequence again even if it was already enumerated by a call to Head).
Additionally, HeadAndTail should not return an instance of IEnumerable (as it could be enumerated multiple times).
This leaves us with the only option: HeadAndTail should return IEnumerator, and, to make things more obvious, it should accept IEnumerator as well (we're just moving an invocation of GetEnumerator from inside the HeadAndTail to the outside, to emphasize it is of one-time use only).
Now that we have worked out the requirements, the implementation is pretty straightforward:
class HeadAndTail<T> {
public readonly T Head;
public readonly IEnumerator<T> Tail;
public HeadAndTail(T head, IEnumerator<T> tail) {
Head = head;
Tail = tail;
}
}
static class IEnumeratorExtensions {
public static HeadAndTail<T> HeadAndTail<T>(this IEnumerator<T> enumerator) {
if (!enumerator.MoveNext()) return null;
return new HeadAndTail<T>(enumerator.Current, enumerator);
}
}
And now it can be used like this:
Console.WriteLine(sequence.GetEnumerator().HeadAndTail().Tail.HeadAndTail().Head);
//Element #2
Or in recursive functions like this:
TResult FoldR<TSource, TResult>(
IEnumerator<TSource> sequence,
TResult seed,
Func<TSource, TResult, TResult> f
) {
var headAndTail = sequence.HeadAndTail();
if (headAndTail == null) return seed;
return f(headAndTail.Head, FoldR(headAndTail.Tail, seed, f));
}
int Sum(IEnumerator<int> sequence) {
return FoldR(sequence, 0, (x, y) => x+y);
}
var array = Enumerable.Range(1, 5);
Console.WriteLine(Sum(array.GetEnumerator())); //1+(2+(3+(4+(5+0)))))
While other approaches here suggest using yield return for the tail enumerable, such an approach adds unnecessary nesting overhead. A better approach would be to convert the Enumerator<T> back into something that can be used with foreach:
public struct WrappedEnumerator<T>
{
T myEnumerator;
public T GetEnumerator() { return myEnumerator; }
public WrappedEnumerator(T theEnumerator) { myEnumerator = theEnumerator; }
}
public static class AsForEachHelper
{
static public WrappedEnumerator<IEnumerator<T>> AsForEach<T>(this IEnumerator<T> theEnumerator) {return new WrappedEnumerator<IEnumerator<T>>(theEnumerator);}
static public WrappedEnumerator<System.Collections.IEnumerator> AsForEach(this System.Collections.IEnumerator theEnumerator)
{ return new WrappedEnumerator<System.Collections.IEnumerator>(theEnumerator); }
}
If one used separate WrappedEnumerator structs for the generic IEnumerable<T> and non-generic IEnumerable, one could have them implement IEnumerable<T> and IEnumerable respectively; they wouldn't really obey the IEnumerable<T> contract, though, which specifies that it should be possible to possible to call GetEnumerator() multiple times, with each call returning an independent enumerator.
Another important caveat is that if one uses AsForEach on an IEnumerator<T>, the resulting WrappedEnumerator should be enumerated exactly once. If it is never enumerated, the underlying IEnumerator<T> will never have its Dispose method called.
Applying the above-supplied methods to the problem at hand, it would be easy to call GetEnumerator() on an IEnumerable<T>, read out the first few items, and then use AsForEach() to convert the remainder so it can be used with a ForEach loop (or perhaps, as noted above, to convert it into an implementation of IEnumerable<T>). It's important to note, however, that calling GetEnumerator() creates an obligation to Dispose the resulting IEnumerator<T>, and the class that performs the head/tail split would have no way to do that if nothing ever calls GetEnumerator() on the tail.
probably not the best way to do it but if you use the .ToList() method you can then get the elements in position [0] and [Count-1], if Count > 0.
But you should specify what do you mean by "can be iterated only once"
What exactly is wrong with .First() and .Last()? Though yeah, I have to agree with the people who asked "what does the tail of an infinite list mean"... the notion doesn't make sense, IMO.

Categories

Resources