How does index in the below example obtain its value? I understand that n is automatically obtained from the source numbers, but, while the meaning is clear, I do not see how index is given its value:
int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
var firstSmallNumbers = numbers.TakeWhile((n, index) => n >= index);
The signature of TakeWhile is:
public static IEnumerable<TSource> TakeWhile<TSource>(this IEnumerable<TSource> source, Func<TSource, int, bool> predicate);
This version of TakeWhile supplies the index of the source element in the sequence as the second parameter to the predicate. I.e. the predicate is called as predicate(5, 0), then predicate(4, 1), predicate(1, 2), predicate(3, 3) etc. See the MSDN documentation.
There is also a “simpler” version of the function, supplying only the values in the sequence, see MSDN.
The index is generated by the implementation of TakeWhile, which might look a bit like this.
Things become clear as long as you figure out how TakeWhile can be implemented :
public static IEnumerable<TSource> TakeWhile<TSource>(this IEnumerable<TSource> source, Func<TSource, int, bool> predicate)
{
int index = 0;
foreach (TSource item in source)
{
if (predicate(item, index))
{
yield return item;
}
else
{
yield break;
}
index++;
}
}
Related
I have a list of integers and I need to find the last occurrence that matches a predicate. To use a very simple example:
var myList = new List<int> { 1, 5, 6, 20, 18, 2, 3, 0, 4 };
var lastMatch = myList.FindLast(e => e == 0 || e == 2);
This seems like the perfect usecase for FindLast. Problem is that this method returns default(T) if nothing was found, which in the case of integers, is actually a valid value 0. So the question is, if this method returns 0, how can I know if it found something or not? Is there a better method for this case with ints?
Use FindLastIndex instead. If the index is negative no match was found. If it isn't negative: that's the index you want, so: use the indexer with that index.
As an alternative to #MarcGravell answer:
Instead of the List FindLast method, you could use the Linq Last extension method overload that takes a predicate as an argument. It will throw an exception if no match is found.
See Last documentation
In general case when we have IEnumerable<T> with arbitrary T (we can't play trick with int? now)
we can implement an extension method:
public static partial class EnumerableExtensions {
public static int LastIndex<T>(this IEnumerable<T> source,
Predicate<T> predicate) {
if (source is null)
throw new ArgumentNullException(nameof(source));
if (predicate is null)
throw new ArgumentNullException(nameof(predicate));
int result = -1;
int index = -1;
foreach (T item in source) {
index += 1;
if (predicate(item))
result = index;
}
return result;
}
}
And then
var lastMatch = myList.LastIndex(e => e == 0 || e == 2);
Examples: Suppose the predicate is i == 0.
Then
[1] -> [(1)]
[0] -> []
[1, 0] -> [(1)]
[0, 1] -> [(1)]
[0, 0] -> []
[1, 1, 0] -> [(1, 1)]
[1, 0, 1] -> [(1), (1)]
[1, 1, 0, 0, 1, 0, 1, 1, 1] -> [(1, 1), (1), (1, 1, 1)]
Basically, returning contiguous subsegments where the predicate is false.
I thought this would work
internal static IEnumerable<IEnumerable<T>> PartitionBy<T>(this IEnumerable<T> source, Func<T, bool> condition)
{
IEnumerator<T> mover = source.GetEnumerator();
for (; mover.MoveNext() ; )
{
var chunk = mover.MoveUntil(condition);
if (chunk.Any())
{
yield return chunk;
}
}
}
private static IEnumerable<T> MoveUntil<T>(this IEnumerator<T> mover, Func<T, bool> condition)
{
bool hitCondition = false;
do
{
if (condition(mover.Current))
{
hitCondition = true;
}
else
{
yield return mover.Current;
}
}
while (!hitCondition && mover.MoveNext());
}
but I was seeing that for example with [1, 1, 0] it will return [(1), (1)]. I don't completely understand why. I can make it work if I change
var chunk = mover.MoveUntil(condition);
to have mover.MoveUntil(condition).ToList(); but if possible I'd like to not have to hold any of the subsegments in memory.
It is possible to stream the results using LINQ calls. The below implementation:
Does not create temporary Lists to reduce memory consumption, I think it would be O(1) for the memory as only one sub-segment is dealt with at a time.
There will be no double enumeration and the predicate will be called exactly once per record.
It would be O(n) for the runtime because like this answer suggests, the GroupBy operation should be O(n) and the other LINQ calls are single-pass operations so should also be O(n).
public static IEnumerable<IEnumerable<T>> PartitionBy<T>(this IEnumerable<T> a, Func<T, bool> predicate)
{
int groupNumber = 0;
Func<bool, int?> getGroupNumber = skip =>
{
if (skip)
{
// prepare next group, we don't care if we increment more than once
// we only want to split groups
groupNumber++;
// null, to be able to filter out group separators
return null;
}
return groupNumber;
};
return a
.Select(x => new { Value = x, GroupNumber = getGroupNumber(predicate(x))} )
.Where(x => x.GroupNumber != null)
.GroupBy(x => x.GroupNumber)
.Select(g => g.Select(x => x.Value));
}
Firstly, I think you wanted O(n) as memory complexity, as your output length is linearly proportional to the input. As a big fan of functional programming, I have chosen to use a fold (that corresponds to the LINQ function Aggregate in C#).
Basically, we start with an empty collection of collection and a flag that indicates whether the next iteration has to create a new sub-collection (we only know that when the predicate matches, i.e. in the previous iteration). I use a tuple that contains those two elements as an accumulator. I extracted the logic of the Aggregate in a separated function for clarity purpose.
static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> a, Func<T, bool> predicate)
{
// The accumulator is a tuple defined as: (collection, mustCreateNewList)
return a.Aggregate((new List<List<T>>(), true), (acc, e) => ForEachElement(acc, e, predicate)).Item1;
}
static (List<List<T>>, bool) ForEachElement<T>((List<List<T>>, bool) acc, T e, Func<T, bool> predicate)
{
var (collection, mustCreateNewList) = acc;
// The predicate matches, continue to iterate!
if (predicate(e)) return (collection, true);
// The previous iteration requests to create a new list
if(mustCreateNewList) collection.Add(new List<T>());
// Add the current element to the last list
collection[collection.Count - 1].Add(e);
return (collection, false);
}
The initial collection is walked through once (O(n)) and the length of the output has the length of the input in the worst case (O(n)).
Example of call:
var array = new int[] { 1, 1, 0, 0, 1, 0, 1, 1, 1 };
var result = array.Partition(i => i == 0);
I'd created a Map function in C# to act, in many ways, as it's JavaScript equivalent to project object types. I've since renamed these methods to 'Select' to use as overloads so they feel more 'integrated'. This is a chain, so bear with me, but the affected functions look like this...
public static TResult Project<TInput, TResult>(this TInput input, Func<TInput, TResult> projectionMapping)
=> projectionMapping(input);
public static TResult Project<TInput, TAccumulatedValue, TIncrementingValue, TResult>(this TInput input, Func<TInput, TAccumulatedValue, TResult> projectionMapping,
Func<TAccumulatedValue, TIncrementingValue, TAccumulatedValue> accumulator, TAccumulatedValue initialAccumulatorValue, TIncrementingValue increment)
=> projectionMapping(input, accumulator(initialAccumulatorValue, increment));
public static IEnumerable<TResult> Select<TInput, TAccumulatedValue, TIncrementingValue, TResult>(this IEnumerable<TInput> input,
Func<TInput, TAccumulatedValue, TResult> projectionMapping, Func<TAccumulatedValue, TIncrementingValue, TAccumulatedValue> accumulator,
TAccumulatedValue initialAccumulatorValue, TIncrementingValue increment)
=> input.Select(item => item.Project(projectionMapping, accumulator, initialAccumulatorValue, increment));
// This doesn't work.
public static IEnumerable<TResult> Select<TInput, TResult>(this IEnumerable<TInput> input,
Func<TInput, int, TResult> projectionMapping, int initialAccumulatorValue = -1, int increment = 1)
{
return input.Select(projectionMapping, (acc, inc) => acc + inc,
initialAccumulatorValue, increment);
}
I am using the int version of the map method, with the accumulator written into it, as follows...
MyList.Add(new List<MyObject>(rowValues.Map((val, headerNumber)
=> new MyObject(headerNumber, val), 0, 10)));
The problem is, that the value of headerNumber never changes (It's always 10) - The accumulator runs once and then is running for each Mapping but it's not remembering it's accumulation between runs. I feel I'm missing something glaringly obvious here but I can't see the wood for the trees.
If I input (for example) an array like this...
rowValues = new string[] { "Item 1", "Item 2", "Item 3" };
I would expect a list of MyObject items that contain the following data...
10 | "Item 1"
20 | "Item 2"
30 | "Item 3"
The problem is that you always call the accumulator with initialAccumulatorValue.
In order to achieve the goal, you need to maintain the accumulated value, and the easiest correct way to do that is using C# iterator method:
public static IEnumerable<TResult> Map<TInput, TAccumulatedValue, TIncrementingValue, TResult>(this IEnumerable<TInput> input,
Func<TInput, TAccumulatedValue, TResult> projectionMapping, Func<TAccumulatedValue, TIncrementingValue, TAccumulatedValue> accumulator,
TAccumulatedValue initialAccumulatorValue, TIncrementingValue increment)
{
var accumulatedValue = initialAccumulatorValue;
foreach (var item in input)
yield return projectionMapping(item, accumulatedValue = accumulator(accumulatedValue, increment));
}
Please note that the naïve attempt to use a combination of closure and Select
var accumulatedValue = initialAccumulatorValue;
return input.Select(item => projectionMapping(item, accumulatedValue = accumulator(accumulatedValue, increment)));
simply doesn't work because the accumulatedValue will be shared by the multiple executions of the returned select query, hence they will produce incorrect result. The iterator method has no such issue because the code is actually executed anytime the GetEnumerator() method is called.
I started by changing your 3rd function so that it just takes an accumulator function that returns the next index in the sequence. This allows the function to have state which you need to calculate the increasing accumulator values.
public static IEnumerable<TResult> Project<TInput, TAccumulatorValue, TResult>(this IEnumerable<TInput> input,
Func<TInput, TAccumulatorValue, TResult> projectionMapping,
Func<TAccumulatorValue> accumulator)
{
return input.Select(item => projectionMapping(item, accumulator()));
}
Then your function that takes the range arguments that didn't work can be written like this, which solves your problem.
public static IEnumerable<TResult> Project<TInput, TResult>(this IEnumerable<TInput> input,
Func<TInput, int, TResult> projectionMapping, int initialAccumulatorValue = 0, int increment = 1)
{
int curValue = initialAccumulatorValue;
return input.Project(projectionMapping,
() => { var ret = curValue; curValue += increment; return ret; });
}
Alternatively
Thinking about this problem in a different way you can make it more generic. All you are really doing is combining two sequences together using projectionMapping to combine the elements. In this case the second sequence happens to contain the accumulator values. Then to use this you just use the standard Linq function Zip, passing in the accumulator sequence and the projectionMapping function.
To get a linear sequence we can use Enumerable.Range, but to get a non-linear range we need to write a Range generator like this
public static IEnumerable<int> Range(int start, int increment)
{
for (; ; )
{
yield return start;
start += increment;
}
}
Examples
Showing both solutions in action
var items = new[] { "a", "b", "c" };
// use Project, returns: a10, b20, c30
var results = items.Project((s, i) => s + i.ToString(), 10, 10).ToList();
// use Zip with custom range, returns: a10, b20, c30
var results2 = items.Zip(Range(10, 10), (s, i) => s + i.ToString()).ToList();
// use Zip with standard range, returns: a1, b2, c3
var results3 = items.Zip(Enumerable.Range(1, int.MaxValue), (s, i) => s + i.ToString()).ToList();
Assuming the following
public class MyObject
{
public int Row { get; }
public string Value { get; }
public MyObject(int row, string value)
{
Value = value;
Row = row;
}
}
With the input source being:
var rows = new[] { "Item 1", "Item 2", "Item 3", "Item 4" };
Solution
To achieve the specific result given to the input you've presented, you can simply use the Select that is provided in the framework as such:
var result = rows.Select((v, i) => new MyObject((i+1)*10, v)).ToList();
Granted, that doesn't look very nice, but does the trick.
Ivan's answer is neater, but I'll persist the above in case someone finds it useful.
I'm running throuth Microsoft's 101 LINQ Samples, and I'm stumped on how this query knows how to assign the correct int value to the correct int field:
public void Linq12()
{
int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
var numsInPlace = numbers.Select((num, index) => new { Num = num, InPlace = (num == index) });
Console.WriteLine("Number: In-place?");
foreach (var n in numsInPlace)
{
Console.WriteLine("{0}: {1}", n.Num, n.InPlace);
}
}
I saw in SO #336758 that there have been errors in the examples before, but it is much more likely that I am just missing something.
Could someone explain this and how the compiler knows how to interpret this data correctly?
EDIT:
OK, I think my confusion comes from the LINQ extension that enables the Select feature to work. The Func and two int parameters IEnumerable<TResult> IEnumerable<int>.Select(Func<int,int,TResult> selector) are most likely the key to my lack of understanding.
I'm not really sure what you are asking of but the Select iterates over the list starting at index 0. If the value of the element at the current index is equal to the index it will set the InPlace property in the anonymous object to true. My guess is that the code above prints true for 3, 6 and 7, right?
It would also make it easier to explain if you write what you don't understand.
Jon Skeet has written a series of blog post where he implement linq, read about Select here: Reimplementation of Select
UPDATE: I noticed in one of your comment to one of the other comments and it seems like it is the lambda and not linq itself that is confusing you. If you read Skeet's blog post you see that Select has two overloads:
public static IEnumerable<TResult> Select<TSource, TResult>(
this IEnumerable<TSource> source,
Func<TSource, TResult> selector)
public static IEnumerable<TResult> Select<TSource, TResult>(
this IEnumerable<TSource> source,
Func<TSource, int, TResult> selector)
The Select with index matches the second overload. As you can see it is an extension of IEnumerable<TSource> which in your case is the list of ints and therefor you are calling the Select on an IEnumerable<int> and the signature of Select becomes: Select<int, TResult>(this IEnumerable<int> source, Func<int, int, TResult> selector). As you can see I changed TSource against int, since that is the generic type of your IEnumerable<int>. I still have TResult since you are using anonymous type. So that might explain some parts?
Looks correct to me.
First you have an anonymous type with Num and InPlace being created. Then the LINQ Select is just iterating over the elements with the element and the index of that element. If you were to rewrite it without linq and anonymous classes, it would look like this:
class NumsInPlace
{
public int Num { get; set; }
public bool InPlace { get; set; }
}
int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
List<NumsInPlace> numsInPlace = new List<int>();
for (int index = 0; i < numbers.length; i++)
{
int num = numers[index];
numsInPlace.Add(new NumsInPlace() { Num = num, InPlace = (index == num) });
}
Console.WriteLine("Number: In-place?");
foreach (var n in numsInPlace)
{
Console.WriteLine("{0}: {1}", n.Num, n.InPlace);
}
The MSDN on Enumerable.Select has the details, but the projection function (num, index) always has the item first, then the index second (if supplied).
how does the compiler know that LINQ Select is using the index value as an index to the array?
The compiler doesn't know about index values. The implementation of that overload of Select knows about index values.
//all the compiler sees is a method that accepts 2 int parameters and returns a bool.
Func<int, int, bool> twoIntFunc = (x, y) => (x == y);
//The compiler sees there's an overload of Enumerable.Select which accepts such a method.
IEnumerable<bool> query = numbers.Select(twoIntFunc);
Enumerable.Select's implementation does the rest of the work by calling that method with the appropriate parameters.
The first argument to selector represents the element to process. The second argument to selector represents the zero-based index of that element in the source sequence.
So - Select will call your method with (5, 0) first, then Select calls it with (4, 1).
The lambda expression (num, index) => new { Num = num, InPlace = (num == index) } is executed once for each element in the input sequence and passed the item and it's index as the arguments.
Update
Lambda expressions can be implicitly typed, that is, from the required type, the compiler can figure out (or imply) what types you intend the arguments to be.
(num, index) => new { Num = num, InPlace = (num == index) }
is equivalent to
someAnonymousType MyMethod(int num, int index)
{
return new
{
Num = num,
InPlace = (num == index)
};
}
obviously you can't write the latter because you can't type the name of an anonymous type, but the compiler can ;)
The compiler knows this because the overload of Select that you're using accepts a Func<TSource, Int32, TResult>, this is a Func that takes, two arguments of type TSource (the type of your IEnumberable<T>, in this case int) and an Int32 (which represents the index) and returns an object of TResult, being whatever you choose to return from your function, in this case, an anonymous type.
The lambda can be cast to the required type and therefore it just works.
The second argument in the select is the index, which increments as the compiler traverses the array of numbers. The compiler will see
num index num = index
5 0 false
4 1 false
1 2 false
3 3 true
9 4 false
8 5 false
6 6 true
7 7 true
2 8 false
0 9 false
It provides true for 3,6 and 7. You need to remember it starts with the index at 0.
In this post the solution to the problem is:
list.Where((item, index) => index < list.Count - 1 && list[index + 1] == item)
The concept of multi-parameter (ie (item, index)) is a bit puzzling to me and I don't know the correct word to narrow down my google results. So 1) What is that called? And more importantly, 2) How are the non-enumerable variable initialize? In this case how is index compiled as an int and initialized to 0?
Thanks.
Lambda expressions have various syntax options:
() => ... // no parameters
x => ... // single parameter named x, compiler infers type
(x) => ... // single parameter named x, compiler infers type
(int x) => ... // single parameter named x, explicit type
(x, y) => ... // two parameters, x and y; compiler infers types
(int x, string y) => ... // two parameters, x and y; explicit types
The subtlety here is that Where has an overload that accepts a Func<T, int, bool>, representing the value and index respectively (and returning the bool for the match). So it is the Where implementation that supplies the index - something like:
static class Example
{
public static IEnumerable<T> Where<T>(this IEnumerable<T> source,
Func<T, int, bool> predicate)
{
int index = 0;
foreach (var item in source)
{
if (predicate(item, index++)) yield return item;
}
}
}
When using LINQ, remember that you are passing a method delegate to the Where method. The particular overload of Where that you are invoking takes a method with signature Func<T,int,bool>, and will call this method for each item in list. Internally, this particular method is keeping count for every item iterated, and calling the supplied delegate using this value as the second parameter:
var result=suppliedDelegate(item,count)
This answer's a little more technical... Remember that lambdas are simply syntatic shortcuts to anonymous delegates (which are anonymous methods).
Edit: They can also be expression trees depending on the signature of Where (see Marc's comment).
list.Where((item, index) => index < list.Count - 1 && list[index + 1] == item)
is functionally equivalent to
// inline, no lambdas
list.Where(delegate(item, index) { return index < list.Count - 1 && list[index + 1] == item; });
// if we assign the lambda (delegate) to a local variable:
var lambdaDelegate = (item, index) => index < list.Count - 1 && list[index + 1] == item;
list.Where(lambdaDelegate);
// without using lambdas as a shortcut:
var anonymousDelegate = delegate(item, index)
{
return index < list.Count - 1 && list[index + 1] == item;
}
list.Where(anonymousDelegate);
// and if we don't use anonymous methods (which is what lambdas represent):
function bool MyDelegate<TSource>(TSource item, int index)
{
return index < list.Count - 1 && list[index + 1] == item;
}
list.Where(MyDelegate);
The Where method has the following signature:
public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, Func<TSource, int, bool> predicate);
which is equivalent to:
delegate bool WhereDelegate<TSource>(TSource source, int index);
public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, WhereDelegate<TSource> predicate);
That's where the item and index are defined.
Behind the scenes, Where may do something like (just a guess, you can decompile to see):
public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, Func<TSource, int, bool> predicate)
{
int index = 0;
foreach (TSource item in source)
{
if (predicate(index, source))
yield return item;
index++;
}
}
So that's where index is initialized and gets passed to your delegate (anonymous, lambda, or otherwise).