Does Take(x) in Linq stops enumerating when taking x objects?

Does Take(x) in Linq stops enumerating when taking x objects? - c#

For example, if I have this code:
public static void Main(string[] args)
{
List<int> list = new List<int>() { 2, 3, 2, 9, 10, 2, 5 };
var out = list.Where(x => x == 2).Take(2).ToList();
}
Is the number of iterations 3 (as the second two is in index 2) or 7 (total number of elements)?
Thanks

Yes, stops.
You can see this clearly by rewriting the code as follows:
var result = list.Where(x =>
{
Console.WriteLine("Where: " + x);
return x == 2;
})
.Take(2).ToList();

list will be iterated by the Where function, returning only matching items.
Where will be iterated by Take, which stops after 2 results.
Take is fully iterated by ToList
So the end result is that the iteration of list is stopped by Take at the second item of 2.

You can easily check it yourself. Let's test the hypothesis that 9 is reached (i.e. at least 4 items has been iterated):
var result = list
.Where(x => x == 2) // your query under test
.Take(2)
.Select(item => item != 9 // business as usual for the first 3 items
? item // throw exception on the 4th
: throw new Exception("Strange execution: 9 (4th item) has been scanned"))
.ToList(); // materialization executes the query
Run it and you'll see that 4th item (9) has not been taken: no exception has been thrown.

I think the most convincing (and simple) answer is to look at the source code of TakeIterator that runs when Take is called:
static IEnumerable<TSource> TakeIterator<TSource>(IEnumerable<TSource> source, int count)
{
if (count > 0) {
foreach (TSource element in source) {
yield return element;
if (--count == 0) break; // Yep, it stops after "count" iterations
}
}
}

If you write some test code with your own IEnumerable and IEnumerator, it will be easy to see what happens.
class MyCollection : IEnumerable<int>
{
public List<int> Data {get; set;} = new List<int>() { 2, 3, 2, 9, 10, 2, 5 };
public IEnumerator<int> GetEnumerator()
{
return new MyEnumerator()
{
Data = this.Data,
};
}
}
And the enumerator:
class MyEnumerator : IEnumerator<int>
{
private int index = -1;
public List<int> Data {get; set;}
public void Reset()
{
this.index = -1;
}
public bool MoveNext()
{
++this.index;
return this.index < this.Data.Count;
}
public int Current
{
get
{
int returnValue = this.Data[this.index];
Debug.WriteLine("[{0}] {1}", this.index, returnValue);
return returnValue;
}
}
}
Test code:
void Main()
{
MyCollection collection = new MyCollection();
var out = collection.Where(x => x == 2).Take(2).ToList();
}

Related

Getting the index of a sequence of items

I was trying to get the index of a sequence of items inside an IEnumerable<T>
var collection = new[] { 1, 2, 3, 4, 5 };
var sequence = new[] { 2, 3 };
// IndexOf is an extension method.
collection.IndexOf(sequence); // Should return 1
I wrote an IndexOf extension method for this and it works fine unless there are more than one of the first item of the sequence in collection, consecutively:
// There are two items that are 2, consecutively in the collection,
// which is the first item of the sequence.
var collection = new[] { 1, 2, 2, 3, 4, 5 };
var sequence = new[] { 2, 3 };
collection.IndexOf(sequence); // Should return 2 but returns -1
Here is the IndexOf method:
public static int IndexOf<T>(this IEnumerable<T> collection,
IEnumerable<T> sequence)
{
var comparer = EqualityComparer<T>.Default;
var counter = 0;
var index = 0;
var seqEnumerator = sequence.GetEnumerator();
foreach (var item in collection)
if (seqEnumerator.MoveNext())
{
if (!comparer.Equals(item, seqEnumerator.Current))
{
seqEnumerator.Dispose();
seqEnumerator = sequence.GetEnumerator();
counter = 0;
// UPDATED AFTER MICHAEL'S ANSWER,
// IT WORKS WITH THIS ADDED PART:
seqEnumerator.MoveNext();
if (comparer.Equals(item, seqEnumerator.Current))
counter++;
}
else counter++;
index++;
}
else break;
var done = !seqEnumerator.MoveNext();
seqEnumerator.Dispose();
return done ? index - counter : -1;
}
I couldn't figure out how to fix this.

public static int IndexOf<T>(this IEnumerable<T> collection,
IEnumerable<T> sequence)
{
var ccount = collection.Count();
var scount = sequence.Count();
if (scount > ccount) return -1;
if (collection.Take(scount).SequenceEqual(sequence)) return 0;
int index = Enumerable.Range(1, ccount - scount + 1)
.FirstOrDefault(i => collection.Skip(i).Take(scount).SequenceEqual(sequence));
if (index == 0) return -1;
return index;
}

When you encounter wrong symbol on not first position you restarting the sequence iterator but you don't check if the current item is matching the start of the sequence iterator, so you actually never compare second 2 from collection to 2 from sequence .

LINQ to count Continues repeated items(int) in an int Array?

Here is an scenario of my question: I have an array, say:
{ 4, 1, 1, 3, 3, 2, 5, 3, 2, 2 }
The result should be something like this (array element => its count):
4 => 1
1 => 2
3 => 2
2 => 1
5 => 1
3 => 1
2 => 2
I know this can be achieved by for loop.
But google'd a lot to make this possible using lesser lines of code using LINQ without success.

I believe the most optimal way to do this is to create a "LINQ-like" extension methods using an iterator block. This allows you to perform the calculation doing a single pass over your data. Note that performance isn't important at all if you just want to perform the calculation on a small array of numbers. Of course this is really your for loop in disguise.
static class Extensions {
public static IEnumerable<Tuple<T, Int32>> ToRunLengths<T>(this IEnumerable<T> source) {
using (var enumerator = source.GetEnumerator()) {
// Empty input leads to empty output.
if (!enumerator.MoveNext())
yield break;
// Retrieve first item of the sequence.
var currentValue = enumerator.Current;
var runLength = 1;
// Iterate the remaining items in the sequence.
while (enumerator.MoveNext()) {
var value = enumerator.Current;
if (!Equals(value, currentValue)) {
// A new run is starting. Return the previous run.
yield return Tuple.Create(currentValue, runLength);
currentValue = value;
runLength = 0;
}
runLength += 1;
}
// Return the last run.
yield return Tuple.Create(currentValue, runLength);
}
}
}
Note that the extension method is generic and you can use it on any type. Values are compared for equality using Object.Equals. However, if you want to you could pass an IEqualityComparer<T> to allow for customization of how values are compared.
You can use the method like this:
var numbers = new[] { 4, 1, 1, 3, 3, 2, 5, 3, 2, 2 };
var runLengths = numbers.ToRunLengths();
For you input data the result will be these tuples:
4 1
1 2
3 2
2 1
5 1
3 1
2 2

(Adding another answer to avoid the two upvotes for my deleted one counting towards this...)
I've had a little think about this (now I've understood the question) and it's really not clear how you'd do this nicely in LINQ. There are definitely ways that it could be done, potentially using Zip or Aggregate, but they'd be relatively unclear. Using foreach is pretty simple:
// Simplest way of building an empty list of an anonymous type...
var results = new[] { new { Value = 0, Count = 0 } }.Take(0).ToList();
// TODO: Handle empty arrays
int currentValue = array[0];
int currentCount = 1;
foreach (var value in array.Skip(1))
{
if (currentValue != value)
{
results.Add(new { Value = currentValue, Count = currentCount });
currentCount = 0;
currentValue = value;
}
currentCount++;
}
// Handle tail, which we won't have emitted yet
results.Add(new { Value = currentValue, Count = currentCount });

Here's a LINQ expression that works (edit: tightened up code just a little more):
var data = new int[] { 4, 1, 1, 3, 3, 2, 5, 3, 2, 2 };
var result = data.Select ((item, index) =>
new
{
Key = item,
Count = (index == 0 || data.ElementAt(index - 1) != item)
? data.Skip(index).TakeWhile (d => d == item).Count ()
: -1
}
)
.Where (d => d.Count != -1);
And here's a proof that shows it working.

This not short enough?
public static IEnumerable<KeyValuePair<T, int>> Repeats<T>(
this IEnumerable<T> source)
{
int count = 0;
T lastItem = source.First();
foreach (var item in source)
{
if (Equals(item, lastItem))
{
count++;
}
else
{
yield return new KeyValuePair<T, int>(lastItem, count);
lastItem = item;
count = 1;
}
}
yield return new KeyValuePair<T, int>(lastItem, count);
}
I'll be interested to see a linq way.

I already wrote the method you need over there. Here's how to call it.
foreach(var g in numbers.GroupContiguous(i => i))
{
Console.WriteLine("{0} => {1}", g.Key, g.Count);
}

Behold (you can run this directly in LINQPad -- rle is where the magic happens):
var xs = new[] { 4, 1, 1, 3, 3, 2, 5, 3, 2, 2 };
var rle = Enumerable.Range(0, xs.Length)
.Where(i => i == 0 || xs[i - 1] != xs[i])
.Select(i => new { Key = xs[i], Count = xs.Skip(i).TakeWhile(x => x == xs[i]).Count() });
Console.WriteLine(rle);
Of course, this is O(n^2), but you didn't request linear efficiency in the spec.

var array = new int[] {1,1,2,3,5,6,6 };
foreach (var g in array.GroupBy(i => i))
{
Console.WriteLine("{0} => {1}", g.Key, g.Count());
}

var array = new int[]{};//whatever ur array is
array.select((s)=>{return array.where((s2)=>{s == s2}).count();});
the only prob with is tht if you have 1 - two times you will get the result for 1-two times

var array = new int[] {1,1,2,3,5,6,6 };
var arrayd = array.Distinct();
var arrayl= arrayd.Select(s => { return array.Where(s2 => s2 == s).Count(); }).ToArray();
Output
arrayl=[0]2 [1]1 [2]1 [3]1 [4]2

Try GroupBy through List<int>
List<int> list = new List<int>() { 4, 1, 1, 3, 3, 2, 5, 3, 2, 2 };
var res = list.GroupBy(val => val);
foreach (var v in res)
{
MessageBox.Show(v.Key.ToString() + "=>" + v.Count().ToString());
}

Interleaving multiple (more than 2) irregular lists using LINQ

Say I have the following data
IEnumerable<IEnumerable<int>> items = new IEnumerable<int>[] {
new int[] { 1, 2, 3, 4 },
new int[] { 5, 6 },
new int[] { 7, 8, 9 }
};
What would be the easiest way to return a flat list with the items interleaved so I'd get the result:
1, 5, 7, 2, 6, 8, 3, 9, 4
Note: The number of inner lists is not known at runtime.

What you're describing is essentially a Transpose Method where overhanging items are included and the result is flattened. Here's my attempt:
static IEnumerable<IEnumerable<T>> TransposeOverhanging<T>(
this IEnumerable<IEnumerable<T>> source)
{
var enumerators = source.Select(e => e.GetEnumerator()).ToArray();
try
{
T[] g;
do
{
yield return g = enumerators
.Where(e => e.MoveNext()).Select(e => e.Current).ToArray();
}
while (g.Any());
}
finally
{
Array.ForEach(enumerators, e => e.Dispose());
}
}
Example:
var result = items.TransposeOverhanging().SelectMany(g => g).ToList();
// result == { 1, 5, 7, 2, 6, 8, 3, 9, 4 }

The solution below is very straight forward. As it turns out, it is also nearly twice as fast as the solution proposed by dtb.
private static IEnumerable<T> Interleave<T>(this IEnumerable<IEnumerable<T>> source )
{
var queues = source.Select(x => new Queue<T>(x)).ToList();
while (queues.Any(x => x.Any())) {
foreach (var queue in queues.Where(x => x.Any())) {
yield return queue.Dequeue();
}
}
}

Here's my attempt, based on dtb's answer. It avoids the external SelectMany and internal ToArray calls.
public static IEnumerable<T> Interleave<T>(this IEnumerable<IEnumerable<T>> source)
{
var enumerators = source.Select(e => e.GetEnumerator()).ToArray();
try
{
bool itemsRemaining;
do
{
itemsRemaining = false;
foreach (var item in
enumerators.Where(e => e.MoveNext()).Select(e => e.Current))
{
yield return item;
itemsRemaining = true;
}
}
while (itemsRemaining);
}
finally
{
Array.ForEach(enumerators, e => e.Dispose());
}
}

Disposed all enumerators, even when exceptions are thrown
Evaluates the outer sequence eagerly, but uses lazy evaluation for the inner sequences.
public static IEnumerable<T> Interleave<T>(IEnumerable<IEnumerable<T>> sequences)
{
var enumerators = new List<IEnumerator<T>>();
try
{
// using foreach here ensures that `enumerators` contains all already obtained enumerators, in case of an expection is thrown here.
// this ensures proper disposing in the end
foreach(var enumerable in sequences)
{
enumerators.Add(enumerable.GetEnumerator());
}
var queue = new Queue<IEnumerator<T>>(enumerators);
while (queue.Any())
{
var enumerator = queue.Dequeue();
if (enumerator.MoveNext())
{
queue.Enqueue(enumerator);
yield return enumerator.Current;
}
}
}
finally
{
foreach(var enumerator in enumerators)
{
enumerator.Dispose();
}
}
}

Though its not as elegant as "dtb"'s answer, but it also works and its a single liner :)
Enumerable.Range(0, items.Max(x => x.Count()))
.ToList()
.ForEach(x =>
{
items
.Where(lstChosen => lstChosen.Count()-1 >= x)
.Select(lstElm => lstElm.ElementAt(x))
.ToList().ForEach(z => Console.WriteLine(z));
});

Most efficient algorithm for merging sorted IEnumerable<T>

I have several huge sorted enumerable sequences that I want to merge. Theses lists are manipulated as IEnumerable but are already sorted. Since input lists are sorted, it should be possible to merge them in one trip, without re-sorting anything.
I would like to keep the defered execution behavior.
I tried to write a naive algorithm which do that (see below). However, it looks pretty ugly and I'm sure it can be optimized. It may exist a more academical algorithm...
IEnumerable<T> MergeOrderedLists<T, TOrder>(IEnumerable<IEnumerable<T>> orderedlists,
Func<T, TOrder> orderBy)
{
var enumerators = orderedlists.ToDictionary(l => l.GetEnumerator(), l => default(T));
IEnumerator<T> tag = null;
var firstRun = true;
while (true)
{
var toRemove = new List<IEnumerator<T>>();
var toAdd = new List<KeyValuePair<IEnumerator<T>, T>>();
foreach (var pair in enumerators.Where(pair => firstRun || tag == pair.Key))
{
if (pair.Key.MoveNext())
toAdd.Add(pair);
else
toRemove.Add(pair.Key);
}
foreach (var enumerator in toRemove)
enumerators.Remove(enumerator);
foreach (var pair in toAdd)
enumerators[pair.Key] = pair.Key.Current;
if (enumerators.Count == 0)
yield break;
var min = enumerators.OrderBy(t => orderBy(t.Value)).FirstOrDefault();
tag = min.Key;
yield return min.Value;
firstRun = false;
}
}
The method can be used like that:
// Person lists are already sorted by age
MergeOrderedLists(orderedList, p => p.Age);
assuming the following Person class exists somewhere:
public class Person
{
public int Age { get; set; }
}
Duplicates should be conserved, we don't care about their order in the new sequence. Do you see any obvious optimization I could use?

Here is my fourth (thanks to #tanascius for pushing this along to something much more LINQ) cut at it:
public static IEnumerable<T> MergePreserveOrder3<T, TOrder>(
this IEnumerable<IEnumerable<T>> aa,
Func<T, TOrder> orderFunc)
where TOrder : IComparable<TOrder>
{
var items = aa.Select(xx => xx.GetEnumerator()).Where(ee => ee.MoveNext())
.OrderBy(ee => orderFunc(ee.Current)).ToList();
while (items.Count > 0)
{
yield return items[0].Current;
var next = items[0];
items.RemoveAt(0);
if (next.MoveNext())
{
// simple sorted linear insert
var value = orderFunc(next.Current);
var ii = 0;
for ( ; ii < items.Count; ++ii)
{
if (value.CompareTo(orderFunc(items[ii].Current)) <= 0)
{
items.Insert(ii, next);
break;
}
}
if (ii == items.Count) items.Add(next);
}
else next.Dispose(); // woops! can't forget IDisposable
}
}
Results:
for (int p = 0; p < people.Count; ++p)
{
Console.WriteLine("List {0}:", p + 1);
Console.WriteLine("\t{0}", String.Join(", ", people[p].Select(x => x.Name)));
}
Console.WriteLine("Merged:");
foreach (var person in people.MergePreserveOrder(pp => pp.Age))
{
Console.WriteLine("\t{0}", person.Name);
}
List 1:
8yo, 22yo, 47yo, 49yo
List 2:
35yo, 47yo, 60yo
List 3:
28yo, 55yo, 64yo
Merged:
8yo
22yo
28yo
35yo
47yo
47yo
49yo
55yo
60yo
64yo
Improved with .Net 4.0's Tuple support:
public static IEnumerable<T> MergePreserveOrder4<T, TOrder>(
this IEnumerable<IEnumerable<T>> aa,
Func<T, TOrder> orderFunc) where TOrder : IComparable<TOrder>
{
var items = aa.Select(xx => xx.GetEnumerator())
.Where(ee => ee.MoveNext())
.Select(ee => Tuple.Create(orderFunc(ee.Current), ee))
.OrderBy(ee => ee.Item1).ToList();
while (items.Count > 0)
{
yield return items[0].Item2.Current;
var next = items[0];
items.RemoveAt(0);
if (next.Item2.MoveNext())
{
var value = orderFunc(next.Item2.Current);
var ii = 0;
for (; ii < items.Count; ++ii)
{
if (value.CompareTo(items[ii].Item1) <= 0)
{ // NB: using a tuple to minimize calls to orderFunc
items.Insert(ii, Tuple.Create(value, next.Item2));
break;
}
}
if (ii == items.Count) items.Add(Tuple.Create(value, next.Item2));
}
else next.Item2.Dispose(); // woops! can't forget IDisposable
}
}

One guess I would make that might improve clarity and performance is this:
Create a priority queue over pairs of T, IEnumerable<T> ordered according to your comparison function on T
For each IEnumerable<T> being merged, add the item to the priority queue annotated with a reference to the IEnumerable<T> where it originated
While the priority queue is not empty
Extract the minimum element from the priority queue
Advance the IEnumerable<T> in its annotation to the next element
If MoveNext() returned true, add the next element to the priority queue annotated with a reference to the IEnumerable<T> you just advanced
If MoveNext() returned false, don't add anything to the priority queue
Yield the dequeued element

How many lists do you expect to need to merge? It looks like your algorithm will not be efficient if you have many different lists to merge. This line is the issue:
var min = enumerators.OrderBy(t => orderBy(t.Value)).FirstOrDefault();
This will be run once for each element in all the lists, so your runtime will be O(n * m), where n is the TOTAL number of elements in all the lists, and n is the number of lists. Expressed in terms of the average length of a list in the list of lists, the runtime is O(a * m^2).
If you are going to need to merge a lot of lists, I would suggest using a heap. Then each iteration you can remove the smallest value from the heap, and add the next element to the heap from the list that the smallest value came from.

Here's a solution with NO SORTING ... just the minimum number of comparisons. (I omitted the actual order func passing for simplicity). Updated to build a balanced tree:-
/// <summary>
/// Merge a pair of ordered lists
/// </summary>
public static IEnumerable<T> Merge<T>(IEnumerable<T> aList, IEnumerable<T> bList)
where T:IComparable<T>
{
var a = aList.GetEnumerator();
bool aOK = a.MoveNext();
foreach (var b in bList)
{
while (aOK && a.Current.CompareTo(b) <= 0) {yield return a.Current; aOK = a.MoveNext();}
yield return b;
}
// And anything left in a
while (aOK) { yield return a.Current; aOK = a.MoveNext(); }
}
/// <summary>
/// Merge lots of sorted lists
/// </summary>
public static IEnumerable<T> Merge<T>(IEnumerable<IEnumerable<T>> listOfLists)
where T : IComparable<T>
{
int n = listOfLists.Count();
if (n < 2)
return listOfLists.FirstOrDefault();
else
return Merge (Merge(listOfLists.Take(n/2)), Merge(listOfLists.Skip(n/2)));
}
public static void Main(string[] args)
{
var sample = Enumerable.Range(1, 5).Select((i) => Enumerable.Range(i, i+5).Select(j => string.Format("Test {0:00}", j)));
Console.WriteLine("Merged:");
foreach (var result in Merge(sample))
{
Console.WriteLine("\t{0}", result);
}

Here is a solution that has very good complexity analysis and that is considerably shorter than the other solutions proposed.
public static IEnumerable<T> Merge<T>(this IEnumerable<IEnumerable<T>> self)
where T : IComparable<T>
{
var es = self.Select(x => x.GetEnumerator()).Where(e => e.MoveNext());
var tmp = es.ToDictionary(e => e.Current);
var dict = new SortedDictionary<T, IEnumerator<T>>(tmp);
while (dict.Count > 0)
{
var key = dict.Keys.First();
var cur = dict[key];
dict.Remove(key);
yield return cur.Current;
if (cur.MoveNext())
dict.Add(cur.Current, cur);
}
}

Here is my solution:
The algorithm takes the first element of each list and puts them within a small helper class (a sorted list that accepts mutliple elements with the same value). This sorted list uses a binary insert.
So the first element in this list is the element we want to return next. After doing so we remove it from the sorted list and insert the next element from its original source list (at least as long as this list contains any more elements). Again, we can return the first element of our sorted list. When the sorted list is empty once, we used all element from all different source lists and are done.
This solution uses less foreach statements and no OrderBy in each step - which should improve the runtime behaviour. Only the binary insert has to be done again and again.
IEnumerable<T> MergeOrderedLists<T, TOrder>( IEnumerable<IEnumerable<T>> orderedlists, Func<T, TOrder> orderBy )
{
// Get an enumerator for each list, create a sortedList
var enumerators = orderedlists.Select( enumerable => enumerable.GetEnumerator() );
var sortedEnumerators = new SortedListAllowingDoublets<TOrder, IEnumerator<T>>();
// Point each enumerator onto the first element
foreach( var enumerator in enumerators )
{
// Missing: assert true as the return value
enumerator.MoveNext();
// Initially add the first value
sortedEnumerators.AddSorted( orderBy( enumerator.Current ), enumerator );
}
// Continue as long as we have elements to return
while( sortedEnumerators.Count != 0 )
{
// The first element of the sortedEnumerator list always
// holds the next element to return
var enumerator = sortedEnumerators[0].Value;
// Return this enumerators current value
yield return enumerator.Current;
// Remove the element we just returned
sortedEnumerators.RemoveAt( 0 );
// Check if there is another element in the list of the enumerator
if( enumerator.MoveNext() )
{
// Ok, so add it to the sorted list
sortedEnumerators.AddSorted( orderBy( enumerator.Current ), enumerator );
}
}
My helper class (using a simple binary insert):
private class SortedListAllowingDoublets<TOrder, T> : Collection<KeyValuePair<TOrder, T>> where T : IEnumerator
{
public void AddSorted( TOrder value, T enumerator )
{
Insert( GetSortedIndex( value, 0, Count - 1 ), new KeyValuePair<TOrder, T>( value, enumerator ) );
}
private int GetSortedIndex( TOrder item, int startIndex, int endIndex )
{
if( startIndex > endIndex )
{
return startIndex;
}
var midIndex = startIndex + ( endIndex - startIndex ) / 2;
return Comparer<TOrder>.Default.Compare( this[midIndex].Key, item ) < 0 ? GetSortedIndex( item, midIndex + 1, endIndex ) : GetSortedIndex( item, startIndex, midIndex - 1 );
}
}
What's not implemented right now: check for an empty list, which will cause problems.
And the SortedListAllowingDoublets class could be improved to take a comparer instead of using the Comparer<TOrder>.Default on its own.

Here is a Linq friendly solution based on the Wintellect's OrderedBag:
public static IEnumerable<T> MergeOrderedLists<T, TOrder>(this IEnumerable<IEnumerable<T>> orderedLists, Func<T, TOrder> orderBy)
where TOrder : IComparable<TOrder>
{
var enumerators = new OrderedBag<IEnumerator<T>>(orderedLists
.Select(enumerable => enumerable.GetEnumerator())
.Where(enumerator => enumerator.MoveNext()),
(x, y) => orderBy(x.Current).CompareTo(orderBy(y.Current)));
while (enumerators.Count > 0)
{
IEnumerator<T> minEnumerator = enumerators.RemoveFirst();
T minValue = minEnumerator.Current;
if (minEnumerator.MoveNext())
enumerators.Add(minEnumerator);
else
minEnumerator.Dispose();
yield return minValue;
}
}
If you use any Enumerator based solution, don't forget to call Dispose()
And here is a simple test:
[Test]
public void ShouldMergeInOrderMultipleOrderedListWithDuplicateValues()
{
// given
IEnumerable<IEnumerable<int>> orderedLists = new[]
{
new [] {1, 5, 7},
new [] {1, 2, 4, 6, 7}
};
// test
IEnumerable<int> merged = orderedLists.MergeOrderedLists(i => i);
// expect
merged.ShouldAllBeEquivalentTo(new [] { 1, 1, 2, 4, 5, 6, 7, 7 });
}

My version of sixlettervariables's answer. I reduced the number of calls to orderFunc (each element only passes through orderFunc once), and in the case of ties, sorting is skipped. This is optimized for small numbers of sources, larger numbers of elements within each source and possibly an expensive orderFunc.
public static IEnumerable<T> MergePreserveOrder<T, TOrder>(
this IEnumerable<IEnumerable<T>> sources,
Func<T, TOrder> orderFunc)
where TOrder : IComparable<TOrder>
{
Dictionary<TOrder, List<IEnumerable<T>>> keyedSources =
sources.Select(source => source.GetEnumerator())
.Where(e => e.MoveNext())
.GroupBy(e => orderFunc(e.Current))
.ToDictionary(g => g.Key, g => g.ToList());
while (keyedSources.Any())
{
//this is the expensive line
KeyValuePair<TOrder, List<IEnumerable<T>>> firstPair = keyedSources
.OrderBy(kvp => kvp.Key).First();
keyedSources.Remove(firstPair.Key);
foreach(IEnumerable<T> e in firstPair.Value)
{
yield return e.Current;
if (e.MoveNext())
{
TOrder newKey = orderFunc(e.Current);
if (!keyedSources.ContainsKey(newKey))
{
keyedSources[newKey] = new List<IEnumerable<T>>() {e};
}
else
{
keyedSources[newKey].Add(e);
}
}
}
}
}
I'm betting this could be further improved by a SortedDictionary, but am not brave enough to try a solution using one without an editor.

Here is a modern implementation that is based on the new and powerful PriorityQueue<TElement, TPriority> class (.NET 6). It combines the low overhead of user7116's solution, with the O(log n) complexity of tanascius's solution (where N is the number of sources). It outperforms most of the other implementations presented in this question (I didn't measure them all), either slightly for small N, or massively for large N.
public static IEnumerable<TSource> MergeSorted<TSource, TKey>(
this IEnumerable<IEnumerable<TSource>> sortedSources,
Func<TSource, TKey> keySelector,
IComparer<TKey> comparer = default)
{
List<IEnumerator<TSource>> enumerators = new();
try
{
foreach (var source in sortedSources)
enumerators.Add(source.GetEnumerator());
var queue = new PriorityQueue<IEnumerator<TSource>, TKey>(comparer);
foreach (var enumerator in enumerators)
{
if (enumerator.MoveNext())
queue.Enqueue(enumerator, keySelector(enumerator.Current));
}
while (queue.TryPeek(out var enumerator, out _))
{
yield return enumerator.Current;
if (enumerator.MoveNext())
queue.EnqueueDequeue(enumerator, keySelector(enumerator.Current));
else
queue.Dequeue();
}
}
finally
{
foreach (var enumerator in enumerators) enumerator.Dispose();
}
}
In order to keep the code simple, all enumerators are disposed at the end of the combined enumeration. A more sophisticated implementation would dispose each enumerator immediately after its completion.

This looks like a terribly useful function to have around so i decided to take a stab at it. My approach is a lot like heightechrider in that it breaks the problem down into merging two sorted IEnumerables into one, then taking that one and merging it with the next in the list. There is most likely some optimization you can do but it works with my simple testcase:
public static IEnumerable<T> mergeSortedEnumerables<T>(
this IEnumerable<IEnumerable<T>> listOfLists,
Func<T, T, Boolean> func)
{
IEnumerable<T> l1 = new List<T>{};
foreach (var l in listOfLists)
{
l1 = l1.mergeTwoSorted(l, func);
}
foreach (var t in l1)
{
yield return t;
}
}
public static IEnumerable<T> mergeTwoSorted<T>(
this IEnumerable<T> l1,
IEnumerable<T> l2,
Func<T, T, Boolean> func)
{
using (var enumerator1 = l1.GetEnumerator())
using (var enumerator2 = l2.GetEnumerator())
{
bool enum1 = enumerator1.MoveNext();
bool enum2 = enumerator2.MoveNext();
while (enum1 || enum2)
{
T t1 = enumerator1.Current;
T t2 = enumerator2.Current;
//if they are both false
if (!enum1 && !enum2)
{
break;
}
//if enum1 is false
else if (!enum1)
{
enum2 = enumerator2.MoveNext();
yield return t2;
}
//if enum2 is false
else if (!enum2)
{
enum1 = enumerator1.MoveNext();
yield return t1;
}
//they are both true
else
{
//if func returns true then t1 < t2
if (func(t1, t2))
{
enum1 = enumerator1.MoveNext();
yield return t1;
}
else
{
enum2 = enumerator2.MoveNext();
yield return t2;
}
}
}
}
}
Then to test it:
List<int> ws = new List<int>() { 1, 8, 9, 16, 17, 21 };
List<int> xs = new List<int>() { 2, 7, 10, 15, 18 };
List<int> ys = new List<int>() { 3, 6, 11, 14 };
List<int> zs = new List<int>() { 4, 5, 12, 13, 19, 20 };
List<IEnumerable<int>> lss = new List<IEnumerable<int>> { ws, xs, ys, zs };
foreach (var v in lss.mergeSortedEnumerables(compareInts))
{
Console.WriteLine(v);
}

I was asked this question as an interview question this evening and did not have a great answer in the 20 or so minutes allotted. So I forced myself to write an algorithm without doing any searches. The constraint was that the inputs were already sorted. Here's my code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Merger
{
class Program
{
static void Main(string[] args)
{
int[] a = { 1, 3, 6, 102, 105, 230 };
int[] b = { 101, 103, 112, 155, 231 };
var mm = new MergeMania();
foreach(var val in mm.Merge<int>(a, b))
{
Console.WriteLine(val);
}
Console.ReadLine();
}
}
public class MergeMania
{
public IEnumerable<T> Merge<T>(params IEnumerable<T>[] sortedSources)
where T : IComparable
{
if (sortedSources == null || sortedSources.Length == 0)
throw new ArgumentNullException("sortedSources");
//1. fetch enumerators for each sourc
var enums = (from n in sortedSources
select n.GetEnumerator()).ToArray();
//2. fetch enumerators that have at least one value
var enumsWithValues = (from n in enums
where n.MoveNext()
select n).ToArray();
if (enumsWithValues.Length == 0) yield break; //nothing to iterate over
//3. sort by current value in List<IEnumerator<T>>
var enumsByCurrent = (from n in enumsWithValues
orderby n.Current
select n).ToList();
//4. loop through
while (true)
{
//yield up the lowest value
yield return enumsByCurrent[0].Current;
//move the pointer on the enumerator with that lowest value
if (!enumsByCurrent[0].MoveNext())
{
//remove the first item in the list
enumsByCurrent.RemoveAt(0);
//check for empty
if (enumsByCurrent.Count == 0) break; //we're done
}
enumsByCurrent = enumsByCurrent.OrderBy(x => x.Current).ToList();
}
}
}
}
Hope it helps.

An attempt to improve on #cdiggins's answer.
This implementation works correctly if two elements that compare as equal are contained in two different sequences (i. e. doesn't have the flaw mentioned by #ChadHenderson).
The algorithm is described in Wikipedia, the complexity is O(m log n), where n is the number of lists being merged and m is the sum of the lengths of the lists.
The OrderedBag<T> from Wintellect.PowerCollections is used instead of a heap-based priority queue, but it doesn't change the complexity.
public static IEnumerable<T> Merge<T>(
IEnumerable<IEnumerable<T>> listOfLists,
Func<T, T, int> comparison = null)
{
IComparer<T> cmp = comparison != null
? Comparer<T>.Create(new Comparison<T>(comparison))
: Comparer<T>.Default;
List<IEnumerator<T>> es = listOfLists
.Select(l => l.GetEnumerator())
.Where(e => e.MoveNext())
.ToList();
var bag = new OrderedBag<IEnumerator<T>>(
(e1, e2) => cmp.Compare(e1.Current, e2.Current));
es.ForEach(e => bag.Add(e));
while (bag.Count > 0)
{
IEnumerator<T> e = bag.RemoveFirst();
yield return e.Current;
if (e.MoveNext())
{
bag.Add(e);
}
}
}

Each list being merged should be already sorted. This method will locate the equal elements with respect to the order of their lists. For example, if elements Ti == Tj, and they are respectively from list i and list j (i < j), then Ti will be in front of Tj in the merged result.
The complexity is O(mn), where n is the number of lists being merged and m is the sum of the lengths of the lists.
public static IEnumerable<T> Merge<T, TOrder>(this IEnumerable<IEnumerable<T>> TEnumerable_2, Func<T, TOrder> orderFunc, IComparer<TOrder> cmp=null)
{
if (cmp == null)
{
cmp = Comparer<TOrder>.Default;
}
List<IEnumerator<T>> TEnumeratorLt = TEnumerable_2
.Select(l => l.GetEnumerator())
.Where(e => e.MoveNext())
.ToList();
while (TEnumeratorLt.Count > 0)
{
int intMinIndex;
IEnumerator<T> TSmallest = TEnumeratorLt.GetMin(TElement => orderFunc(TElement.Current), out intMinIndex, cmp);
yield return TSmallest.Current;
if (TSmallest.MoveNext() == false)
{
TEnumeratorLt.RemoveAt(intMinIndex);
}
}
}
/// <summary>
/// Get the first min item in an IEnumerable, and return the index of it by minIndex
/// </summary>
public static T GetMin<T, TOrder>(this IEnumerable<T> self, Func<T, TOrder> orderFunc, out int minIndex, IComparer<TOrder> cmp = null)
{
if (self == null) throw new ArgumentNullException("self");
IEnumerator<T> selfEnumerator = self.GetEnumerator();
if (!selfEnumerator.MoveNext()) throw new ArgumentException("List is empty.", "self");
if (cmp == null) cmp = Comparer<TOrder>.Default;
T min = selfEnumerator.Current;
minIndex = 0;
int intCount = 1;
while (selfEnumerator.MoveNext ())
{
if (cmp.Compare(orderFunc(selfEnumerator.Current), orderFunc(min)) < 0)
{
min = selfEnumerator.Current;
minIndex = intCount;
}
intCount++;
}
return min;
}

I've took a more functional approach, hope this reads well.
First of all here is the merge method itself:
public static IEnumerable<T> MergeSorted<T>(IEnumerable<IEnumerable<T>> xss) where T :IComparable
{
var stacks = xss.Select(xs => new EnumerableStack<T>(xs)).ToList();
while (true)
{
if (stacks.All(x => x.IsEmpty)) yield break;
yield return
stacks
.Where(x => !x.IsEmpty)
.Select(x => new { peek = x.Peek(), x })
.MinBy(x => x.peek)
.x.Pop();
}
}
The idea is that we turn each IEnumerable into EnumerableStack that has Peek(), Pop() and IsEmpty members.
It works just like a regular stack. Note that calling IsEmpty might enumerate wrapped IEnumerable.
Here is the code:
/// <summary>
/// Wraps IEnumerable in Stack like wrapper
/// </summary>
public class EnumerableStack<T>
{
private enum StackState
{
Pending,
HasItem,
Empty
}
private readonly IEnumerator<T> _enumerator;
private StackState _state = StackState.Pending;
public EnumerableStack(IEnumerable<T> xs)
{
_enumerator = xs.GetEnumerator();
}
public T Pop()
{
var res = Peek(isEmptyMessage: "Cannot Pop from empty EnumerableStack");
_state = StackState.Pending;
return res;
}
public T Peek()
{
return Peek(isEmptyMessage: "Cannot Peek from empty EnumerableStack");
}
public bool IsEmpty
{
get
{
if (_state == StackState.Empty) return true;
if (_state == StackState.HasItem) return false;
ReadNext();
return _state == StackState.Empty;
}
}
private T Peek(string isEmptyMessage)
{
if (_state != StackState.HasItem)
{
if (_state == StackState.Empty) throw new InvalidOperationException(isEmptyMessage);
ReadNext();
if (_state == StackState.Empty) throw new InvalidOperationException(isEmptyMessage);
}
return _enumerator.Current;
}
private void ReadNext()
{
_state = _enumerator.MoveNext() ? StackState.HasItem : StackState.Empty;
}
}
Finally, here is the MinBy extension method in case haven't written one on your own already:
public static T MinBy<T, TS>(this IEnumerable<T> xs, Func<T, TS> selector) where TS : IComparable
{
var en = xs.GetEnumerator();
if (!en.MoveNext()) throw new Exception();
T max = en.Current;
TS maxVal = selector(max);
while(en.MoveNext())
{
var x = en.Current;
var val = selector(x);
if (val.CompareTo(maxVal) < 0)
{
max = x;
maxVal = val;
}
}
return max;
}

This is an alternate solution:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Reflection;
using System.Data;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Person
{
public string Name
{
get;
set;
}
public int Age
{
get;
set;
}
}
public class Program
{
public static void Main()
{
Person[] persons1 = new Person[] { new Person() { Name = "Ahmed", Age = 20 }, new Person() { Name = "Ali", Age = 40 } };
Person[] persons2 = new Person[] { new Person() { Name = "Zaid", Age = 21 }, new Person() { Name = "Hussain", Age = 22 } };
Person[] persons3 = new Person[] { new Person() { Name = "Linda", Age = 19 }, new Person() { Name = "Souad", Age = 60 } };
Person[][] personArrays = new Person[][] { persons1, persons2, persons3 };
foreach(Person person in MergeOrderedLists<Person, int>(personArrays, person => person.Age))
{
Console.WriteLine("{0} {1}", person.Name, person.Age);
}
Console.ReadLine();
}
static IEnumerable<T> MergeOrderedLists<T, TOrder>(IEnumerable<IEnumerable<T>> orderedlists, Func<T, TOrder> orderBy)
{
List<IEnumerator<T>> enumeratorsWithData = orderedlists.Select(enumerable => enumerable.GetEnumerator())
.Where(enumerator => enumerator.MoveNext()).ToList();
while (enumeratorsWithData.Count > 0)
{
IEnumerator<T> minEnumerator = enumeratorsWithData[0];
for (int i = 1; i < enumeratorsWithData.Count; i++)
if (((IComparable<TOrder>)orderBy(minEnumerator.Current)).CompareTo(orderBy(enumeratorsWithData[i].Current)) >= 0)
minEnumerator = enumeratorsWithData[i];
yield return minEnumerator.Current;
if (!minEnumerator.MoveNext())
enumeratorsWithData.Remove(minEnumerator);
}
}
}
}

I'm suspicious LINQ is smart enough to take advantage of the prior existing sort order:
IEnumerable<string> BiggerSortedList = BigListOne.Union(BigListTwo).OrderBy(s => s);

C# - elegant way of partitioning a list?

I'd like to partition a list into a list of lists, by specifying the number of elements in each partition.
For instance, suppose I have the list {1, 2, ... 11}, and would like to partition it such that each set has 4 elements, with the last set filling as many elements as it can. The resulting partition would look like {{1..4}, {5..8}, {9..11}}
What would be an elegant way of writing this?

Here is an extension method that will do what you want:
public static IEnumerable<List<T>> Partition<T>(this IList<T> source, Int32 size)
{
for (int i = 0; i < (source.Count / size) + (source.Count % size > 0 ? 1 : 0); i++)
yield return new List<T>(source.Skip(size * i).Take(size));
}
Edit: Here is a much cleaner version of the function:
public static IEnumerable<List<T>> Partition<T>(this IList<T> source, Int32 size)
{
for (int i = 0; i < Math.Ceiling(source.Count / (Double)size); i++)
yield return new List<T>(source.Skip(size * i).Take(size));
}

Using LINQ you could cut your groups up in a single line of code like this...
var x = new List<int>() { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 };
var groups = x.Select((i, index) => new
{
i,
index
}).GroupBy(group => group.index / 4, element => element.i);
You could then iterate over the groups like the following...
foreach (var group in groups)
{
Console.WriteLine("Group: {0}", group.Key);
foreach (var item in group)
{
Console.WriteLine("\tValue: {0}", item);
}
}
and you'll get an output that looks like this...
Group: 0
Value: 1
Value: 2
Value: 3
Value: 4
Group: 1
Value: 5
Value: 6
Value: 7
Value: 8
Group: 2
Value: 9
Value: 10
Value: 11

Something like (untested air code):
IEnumerable<IList<T>> PartitionList<T>(IList<T> list, int maxCount)
{
List<T> partialList = new List<T>(maxCount);
foreach(T item in list)
{
if (partialList.Count == maxCount)
{
yield return partialList;
partialList = new List<T>(maxCount);
}
partialList.Add(item);
}
if (partialList.Count > 0) yield return partialList;
}
This returns an enumeration of lists rather than a list of lists, but you can easily wrap the result in a list:
IList<IList<T>> listOfLists = new List<T>(PartitionList<T>(list, maxCount));

To avoid grouping, mathematics and reiteration.
The method avoids unnecessary calculations, comparisons and allocations. Parameter validation is included.
Here is a working demonstration on fiddle.
public static IEnumerable<IList<T>> Partition<T>(
this IEnumerable<T> source,
int size)
{
if (size < 2)
{
throw new ArgumentOutOfRangeException(
nameof(size),
size,
"Must be greater or equal to 2.");
}
T[] partition;
int count;
using (var e = source.GetEnumerator())
{
if (e.MoveNext())
{
partition = new T[size];
partition[0] = e.Current;
count = 1;
}
else
{
yield break;
}
while(e.MoveNext())
{
partition[count] = e.Current;
count++;
if (count == size)
{
yield return partition;
count = 0;
partition = new T[size];
}
}
}
if (count > 0)
{
Array.Resize(ref partition, count);
yield return partition;
}
}

var yourList = new List<int> { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 };
var groupSize = 4;
// here's the actual query that does the grouping...
var query = yourList
.Select((x, i) => new { x, i })
.GroupBy(i => i.i / groupSize, x => x.x);
// and here's a quick test to ensure that it worked properly...
foreach (var group in query)
{
foreach (var item in group)
{
Console.Write(item + ",");
}
Console.WriteLine();
}
If you need an actual List<List<T>> rather than an IEnumerable<IEnumerable<T>> then change the query as follows:
var query = yourList
.Select((x, i) => new { x, i })
.GroupBy(i => i.i / groupSize, x => x.x)
.Select(g => g.ToList())
.ToList();

Or in .Net 2.0 you would do this:
static void Main(string[] args)
{
int[] values = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 };
List<int[]> items = new List<int[]>(SplitArray(values, 4));
}
static IEnumerable<T[]> SplitArray<T>(T[] items, int size)
{
for (int index = 0; index < items.Length; index += size)
{
int remains = Math.Min(size, items.Length-index);
T[] segment = new T[remains];
Array.Copy(items, index, segment, 0, remains);
yield return segment;
}
}

public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> list, int size)
{
while (list.Any()) { yield return list.Take(size); list = list.Skip(size); }
}
and for the special case of String
public static IEnumerable<string> Partition(this string str, int size)
{
return str.Partition<char>(size).Select(AsString);
}
public static string AsString(this IEnumerable<char> charList)
{
return new string(charList.ToArray());
}

Using ArraySegments might be a readable and short solution (casting your list to array is required):
var list = new List<int>() { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 }; //Added 0 in front on purpose in order to enhance simplicity.
int[] array = list.ToArray();
int step = 4;
List<int[]> listSegments = new List<int[]>();
for(int i = 0; i < array.Length; i+=step)
{
int[] segment = new ArraySegment<int>(array, i, step).ToArray();
listSegments.Add(segment);
}

I'm not sure why Jochems answer using ArraySegment was voted down. It could be really useful as long as you are not going to need to extend the segments (cast to IList). For example, imagine that what you are trying to do is pass segments into a TPL DataFlow pipeline for concurrent processing. Passing the segments in as IList instances allows the same code to deal with arrays and lists agnostically.
Of course, that begs the question: Why not just derive a ListSegment class that does not require wasting memory by calling ToArray()? The answer is that arrays can actually be processed marginally faster in some situations (slightly faster indexing). But you would have to be doing some fairly hardcore processing to notice much of a difference. More importantly, there is no good way to protect against random insert and remove operations by other code holding a reference to the list.
Calling ToArray() on a million value numeric list takes about 3 milliseconds on my workstation. That's usually not too great a price to pay when you're using it to gain the benefits of more robust thread safety in concurrent operations, without incurring the heavy cost of locking.

You could use an extension method:
public static IList<HashSet<T>> Partition<T>(this IEnumerable<T> input, Func<T, object> partitionFunc)
{
Dictionary<object, HashSet> partitions = new Dictionary<object, HashSet<T>>();
object currentKey = null;
foreach (T item in input ?? Enumerable.Empty<T>())
{
currentKey = partitionFunc(item);
if (!partitions.ContainsKey(currentKey))
{
partitions[currentKey] = new HashSet<T>();
}
partitions[currentKey].Add(item);
}
return partitions.Values.ToList();
}

To avoid multiple checks, unnecessary instantiations, and repetitive iterations, you could use the code:
namespace System.Collections.Generic
{
using Linq;
using Runtime.CompilerServices;
public static class EnumerableExtender
{
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsEmpty<T>(this IEnumerable<T> enumerable) => !enumerable?.GetEnumerator()?.MoveNext() ?? true;
public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> source, int size)
{
if (source == null)
throw new ArgumentNullException(nameof(source));
if (size < 2)
throw new ArgumentOutOfRangeException(nameof(size));
IEnumerable<T> items = source;
IEnumerable<T> partition;
while (true)
{
partition = items.Take(size);
if (partition.IsEmpty())
yield break;
else
yield return partition;
items = items.Skip(size);
}
}
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Does Take(x) in Linq stops enumerating when taking x objects? - c#

For example, if I have this code: public static void Main(string[] args) { List<int> list = new List<int>() { 2, 3, 2, 9, 10, 2, 5 }; var out = list.Where(x => x == 2).Take(2).ToList(); } Is the number of iterations 3 (as the second two is in index 2) or 7 (total number of elements)? Thanks

Yes, stops. You can see this clearly by rewriting the code as follows: var result = list.Where(x => { Console.WriteLine("Where: " + x); return x == 2; }) .Take(2).ToList();

list will be iterated by the Where function, returning only matching items. Where will be iterated by Take, which stops after 2 results. Take is fully iterated by ToList So the end result is that the iteration of list is stopped by Take at the second item of 2.

Related

Getting the index of a sequence of items

LINQ to count Continues repeated items(int) in an int Array?

Interleaving multiple (more than 2) irregular lists using LINQ

Most efficient algorithm for merging sorted IEnumerable<T>

C# - elegant way of partitioning a list?

Categories

Resources