How to create sublists from a list with linq?

How to create sublists from a list with linq? - c#

Here is another way Split a List into smaller lists of N size
The purpose of this post is to share knowledge involving "Linq" and opinions without using "for" ties and "ranges" directly.
Example: I have a list of 100 items and I need to make it into 10 lists.
I use the follow script, somebody has a better way or more performatic?
var subLists = myList.Select((x, i) => new { Index = i, Item = x })
.GroupBy(x => x.Index / "MAXIMUM ITEMS ON SUBLIST")
.Select(x => x.Select(v => X.Item).ToList());

It`s a slow operation
(x, i) => new { Index = i, Item = x }
Here's an extension method that will work with any list
public static IEnumerable<List<T>> splitList<T>(List<T> items, int size)
{
for (int i=0; i < items.Count; i+= size)
{
yield return items.GetRange(i, Math.Min(size, items.Count - i));
}
}
OR better performance
public static List<List<T>> splitList<T>(this List<T> items, int size)
{
List<List<T>> list = new List<List<T>>();
for (int i = 0; i < items.Count; i += size)
list.Add(items.GetRange(i, Math.Min(size, items.Count - i)));
return list;
}

Let's create a generic answer. One that works for any sequence of any length, where you want to split the sequence into a sequence of sub-sequences, where every sub-sequence has a specified length, except maybe for the last:
For example:
IEnumerable<int> items = {10, 11, 12, 13, 14, 15, 16, 17};
// split into subsequences of length 3:
IEnumerable<IEnumerable> splitSequence = items.Split(3);
// splitSequence is a sequence of 3 subsequences:
// {10, 11, 12},
// {13, 14, 15},
// {16, 17}
We'll do this by creating an extension method. This way, the method Split can be used as any LINQ function. See extension methods demystified. To make it efficient, I'll enumerate only once, and I don't enumerate any more items than requested for.
IEnumerable<TSource> Split(this IEnumerable<TSource> source, int splitSize)
{
// TODO: exception if null source, or non-positive splitSize
// Get the enumerator and enumerate enough elements to return
IEnumerator<TSource> enumerator = source.GetEnumerator();
while (enumerator.MoveNext())
{
// there are still items in the source; fill a new sub-sequence
var subSequence = new List<Tsource>(SplitSize);
do
{ // add the current item to the list:
subSequence.Add(enumerator.Current);
}
// repeat this until the subSequence is full or until source has no more elements:
while (subSequence.Count() < splitSize && enumerator.MoveNext());
// return the subSequence
yield return subSequence;
}
}
Usage:
// Get all Students that live in New York, split them into groups of 10 Students
// and return groups that have at least one Law Student
var newYorkLasStudentGroups = GetStudents();
.OrderBy(student => student.UniversityLocation == "New York")
.Split(10)
.Where(studentGroup => studentGroup.Any(student => student.Study == "Law"));

This question is not a duplicate. As mentioned, I question a possible form using linq that would be more perfomatic. "For" ties with "range", for example, I am aware of.
Thank you all for your collaboration, comments and possible solutions !!!

Related

Variable number of for loops without recursion but with Stack?

I know the usual approach for "variable number of for loops" is said to use a recursive method. But I wonder if I could solve that without recursion and instead with using Stack, since you can bypass recursion with the use of a stack.
My example:
I have a variable number of collections and I need to combine every item of every collection with every other item of the other collections.
// example for collections A, B and C:
A (4 items) + B (8 items) + C (10 items)
4 * 8 * 10 = 320 combinations
I need to run through all those 320 combinations. Yet at compile time I don't know if B or C or D exist. How would a solution with no recursive method but with the use of an instance of Stack look like?
Edit:
I realized Stack is not necessary here at all, while you can avoid recursion with a simple int array and a few while loops. Thanks for help and info.

Not with a stack but without recursion.
void Main()
{
var l = new List<List<int>>()
{
new List<int>(){ 1,2,3 },
new List<int>(){ 4,5,6 },
new List<int>(){ 7,8,9 }
};
var result = CartesianProduct(l);
}
static IEnumerable<IEnumerable<T>> CartesianProduct<T>(IEnumerable<IEnumerable<T>> sequences)
{
IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>()};
return sequences.Aggregate(
emptyProduct,
(accumulator, sequence) =>
from accseq in accumulator
from item in sequence
select accseq.Concat(new[] {item})
);
}
Function taken form Computing a Cartesian Product with LINQ

Here is an example of how to do this. Algorithm is taken from this question - https://stackoverflow.com/a/2419399/5311735 and converted to C#. Note that it can be made more efficient, but I converted inefficient version to C# because it's better illustrates the concept (you can see more efficient version in the linked question):
static IEnumerable<T[]> CartesianProduct<T>(IList<IList<T>> collections) {
// this contains the indexes of elements from each collection to combine next
var indexes = new int[collections.Count];
bool done = false;
while (!done) {
// initialize array for next combination
var nextProduct = new T[collections.Count];
// fill it
for (int i = 0; i < collections.Count; i++) {
var collection = collections[i];
nextProduct[i] = collection[indexes[i]];
}
yield return nextProduct;
// now we need to calculate indexes for the next combination
// for that, increase last index by one, until it becomes equal to the length of last collection
// then increase second last index by one until it becomes equal to the length of second last collection
// and so on - basically the same how you would do with regular numbers - 09 + 1 = 10, 099 + 1 = 100 and so on.
var j = collections.Count - 1;
while (true) {
indexes[j]++;
if (indexes[j] < collections[j].Count) {
break;
}
indexes[j] = 0;
j--;
if (j < 0) {
done = true;
break;
}
}
}
}

Repeat numbers in List/IEnumerable

I have a list, e.g.
List<int> List1 = new List<int>{1, 5, 8, 3, 9};
What is a simple way of repeating the elements in the list to obtain {1, 1, 5, 5, 8, 8, 3, 3, 9, 9}?
The reason I need this is that I am plotting the elements in the list and need to make a "step plot".

var list2 = List1.SelectMany(x => new []{x, x}).ToList();

I would create (extension) method which enumerates source and yields each item required number of times:
public static IEnumerable<T> RepeatItems<T>(this IEnumeable<T> source, int count)
{
foreach(var item in source)
for(int i = 0; i < count; i++)
yield return item;
}
Thus you will avoid creating huge number of arrays. Usage:
var result = List1.RepeatItems(2).ToList();
If you need just to duplicate items, then solution is even more simple:
public static IEnumerable<T> DuplicateItems<T>(this IEnumeable<T> source)
{
foreach(var item in source)
{
yield return item;
yield return item;
}
}
Usage of DuplicateItems extension:
var result = List1.DuplicateItems().ToList();
Also if you will only enumerate result, then you don't need to convert it to list. If you will not modify (add/remove) items from result, then converting it to array is more efficient.

Taken from the comments above,
var sequence2 = List1.SelectMany(x => Enumerable.Repeat(x, 2));
is a better solution becuase it avoids pointless allocation of memory. It would also be simpler to change to n repetitions where the variation in overhead would become more significant.

It's you're trying to reduce memory allocations:
// Pre-allocate the space to save time
List<int> dups = new List(List1.Count * 2);
// Avoid allocating an enumerator (hopefully!)
for(int i=0; i<List1.Count; i++)
{
var value = List1[i];
dups.Add(value);
dups.Add(value);
}
It's not Linq, but it's memory efficient

How to split a List with different chunk sizes efficiently?

I have the following extension method to split a List<T> into a list of List<T>'s with different chunk sizes, but I'm doubting its efficiency. Anything I can do to improve it or is it fine as is?
public static List<List<T>> Split<T>(this List<T> source, params int[] chunkSizes)
{
int totalSize = chunkSizes.Sum();
int sourceCount = source.Count();
if (totalSize > sourceCount)
{
throw new ArgumentException("Sum of chunk sizes is larger than the number of elements in source.", "chunkSizes");
}
List<List<T>> listOfLists = new List<List<T>>(chunkSizes.Length);
int index = 0;
foreach (int chunkSize in chunkSizes)
{
listOfLists.Add(source.GetRange(index, chunkSize));
index += chunkSize;
}
// Get the entire last part if the total size of all the chunks is less than the actual size of the source
if (totalSize < sourceCount)
{
listOfLists.Add(source.GetRange(index, sourceCount - totalSize));
}
return listOfLists;
}
Example code usage:
List<int> list = new List<int> { 1,2,4,5,6,7,8,9,10,12,43,23,453,34,23,112,4,23 };
var result = list.Split(2, 3, 3, 2, 1, 3);
Console.WriteLine(result);
This gets a desired result and a has a final list part with 4 numbers as the total chunk size is 4 less than the size of my list.
I'm especially doubtful of the GetRange part as I fear this is just enumerating the same source over and over...
EDIT: I think I know a way to enumerate the source once: Just do a foreach on the source itself and keep checking if the number of iterated elements is the same as the current chunksize. If so, add the new list and go to the next chunksize. Thoughts?

There is no performance problem with this code. GetRange is documented to be O(chunkSize), and this is also easy to deduce since one of the most important properties of List<T> is exactly that it allows O(1) indexing.
That said, you could write a more LINQ-y version of the code like this:
var rangeStart = 0;
var ranges = chunkSizes.Select(n => Tuple.Create((rangeStart += n) - n, n))
.ToArray();
var lists = ranges.Select(r => source.GetRange(r.Item1, r.Item2)).ToList();
if (rangeStart < source.Count) {
lists.Last().AddRange(source.Skip(rangeStart));
}
return lists;

I would suggest to use this extension method to chunk the source list to the sub-lists by specified chunk size:
using System.Collections.Generic;
using System.Linq;
...
/// <summary>
/// Helper methods for the lists.
/// </summary>
public static class ListExtensions
{
public static List<List<T>> ChunkBy<T>(this List<T> source, int chunkSize)
{
return source
.Select((x, i) => new { Index = i, Value = x })
.GroupBy(x => x.Index / chunkSize)
.Select(x => x.Select(v => v.Value).ToList())
.ToList();
}
}

Array operations with n-dimensional array using LINQ (C#)

Assume we have a jagged array
int[][] a = { new[] { 1, 2, 3, 4 }, new[] { 5, 6, 7, 8 }, new[] { 9, 10, 11, 12 } };
To get a sum of second row and sum of second column, it can be written both code lines respectively:
int rowSum = a[1].Sum();
int colSum = a.Select(row => row[1]).Sum();
But if we have definition of 2-dimensional array
int[,] a = { { 1, 2, 3, 4 }, { 5, 6, 7, 8 }, { 9, 10, 11, 12 } };
the above-cited code will not work due to compiller errors:
Error 1 Wrong number of indices inside []; expected 2
Error 2 'int[*,*]' does not contain a definition for 'Select' and no extension method 'Select' accepting a first argument of type 'int[*,*]' could be found (are you missing a using directive or an assembly reference?)
So, the question: How to use LINQ methods with n-dimensional arrays, but not jagged ones? And is where a method to convert rectangular array to jagged?
P.S. I tried to find the answer in documentation, but without result.

LINQ to Objects is based on the IEnumerable<T> Interface, i.e. a one-dimensional sequence of values. This means it doesn't mix well with n-dimensional data structures like non-jagged arrays, although it's possible.
You can generate one-dimensional sequence of integers that index into the n-dimensional array:
int rowSum = Enumerable.Range(0, a.GetLength(1)).Sum(i => a[1, i]);
int colSum = Enumerable.Range(0, a.GetLength(0)).Sum(i => a[i, 1]);

About your question "How to use LINQ methods with n-dimensional arrays":
You can't use most LINQ methods with a n dimensional array, because such an array only implements IEnumerable but not IEnumerable<T> and most of the LINQ extension methods are extension methods for IEnumerable<T>.
About the other question: See dtb's answer.

To add to dtb's solution, a more general way of iterating over all items of the array would be:
int[,] b = { { 1, 2, 3, 4 }, { 5, 6, 7, 8 }, { 9, 10, 11, 12 } };
var flattenedArray = Enumerable.Range(0, b.GetLength(0))
.SelectMany(i => Enumerable.Range(0, b.GetLength(1))
.Select(j => new { Row = i, Col = j }));
And now:
var rowSum2 = flattenedArray.Where(t => t.Row == 1).Sum(t => b[t.Row, t.Col]);
var colSum2 = flattenedArray.Where(t => t.Col == 1).Sum(t => b[t.Row, t.Col]);
Of course this is ultra-wasteful as we are creating coordinate tuples even for those items that we will end up filtering out with Where, but if you don't know what the selection criteria will be beforehand this is the way to go (or not -- this seems more like an excercise than something you 'd want to do in practice).
I can also imagine how this might be extended for arrays of any rank (not just 2D) using a recursive lambda and something like Tuple, but that crosses over into masochism territory.

The 2D array doesn't have any built in way of iterating over a row or column. It's not too difficult to create your own such method though. See this class for an implementation which gets an enumerable for row and column.
public static class LINQTo2DArray
{
public static IEnumerable<T> Row<T>(this T[,] Array, int Row)
{
for (int i = 0; i < Array.GetLength(1); i++)
{
yield return Array[Row, i];
}
}
public static IEnumerable<T> Column<T>(this T[,] Array, int Column)
{
for (int i = 0; i < Array.GetLength(0); i++)
{
yield return Array[i, Column];
}
}
}
You can also flatten the array usinga.Cast<int>() but you would then loose all the info about columns/rows

A simpler way is doing like below
var t = new List<Tuple<int, int>>();
int[][] a = t.Select(x => new int[]{ x.Item1, x.Item2}).ToArray();

The simplest LINQ only approach I can see to do these kinds of row and column operations on a two dimensional array is to define the following lookups:
var cols = a
.OfType<int>()
.Select((x, n) => new { x, n, })
.ToLookup(xn => xn.n % a.GetLength(1), xn => xn.x);
var rows = a
.OfType<int>()
.Select((x, n) => new { x, n, })
.ToLookup(xn => xn.n / a.GetLength(1), xn => xn.x);
Now you can simply do this:
var firstColumnSum = cols[0].Sum();
As for n-dimensional, it just gets too painful... Sorry.

How do I sum a list<> of arrays

I have a List< int[] > myList, where I know that all the int[] arrays are the same length - for the sake of argument, let us say I have 500 arrays, each is 2048 elements long. I'd like to sum all 500 of these arrays, to give me a single array, 2048 elements long, where each element is the sum of all the same positions in all the other arrays.
Obviously this is trivial in imperative code:
int[] sums = new int[myList[0].Length];
foreach(int[] array in myList)
{
for(int i = 0; i < sums.Length; i++)
{
sums[i] += array[i];
}
}
But I was wondering if there was a nice Linq or Enumerable.xxx technique?

Edit: Ouch...This became a bit harder while I wasn't looking. Changing requirements can be a real PITA.
Okay, so take each position in the array, and sum it:
var sums = Enumerable.Range(0, myList[0].Length)
.Select(i => myList.Select(
nums => nums[i]
).Sum()
);
That's kind of ugly...but I think the statement version would be even worse.

EDIT: I've left this here for the sake of interest, but the accepted answer is much nicer.
EDIT: Okay, my previous attempt (see edit history) was basically completely wrong...
You can do this with a single line of LINQ, but it's horrible:
var results = myList.SelectMany(array => array.Select(
(value, index) => new { value, index })
.Aggregate(new int[myList[0].Length],
(result, item) => { result[item.index] += value; return result; });
I haven't tested it, but I think it should work. I wouldn't recommend it though. The SelectMany flattens all the data into a sequence of pairs - each pair is the value, and its index within its original array.
The Aggregate step is entirely non-pure - it modifies its accumulator as it goes, by adding the right value at the right point.
Unless anyone can think of a way of basically pivoting your original data (at which point my earlier answer is what you want) I suspect you're best off doing this the non-LINQ way.

This works with any 2 sequences, not just arrays:
var myList = new List<int[]>
{
new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9 },
new int[] { 10, 20, 30, 40, 50, 60, 70, 80, 90 }
};
var sums =
from array in myList
from valueIndex in array.Select((value, index) => new { Value = value, Index = index })
group valueIndex by valueIndex.Index into indexGroups
select indexGroups.Select(indexGroup => indexGroup.Value).Sum()
foreach(var sum in sums)
{
Console.WriteLine(sum);
}
// Prints:
//
// 11
// 22
// 33
// 44
// 55
// 66
// 77
// 88
// 99

OK, assuming we can assume that the sum of the ints at each position over the list of arrays will itself fit into an int (which is a dodgy assumption, but I'll make it anyway to make the job easier):
int[] sums =
Enumerable.Range(0, listOfArrays[0].Length-1).
Select(sumTotal =>
Enumerable.Range(0, listOfArrays.Count-1).
Aggregate((total, listIndex) =>
total += listOfArrays[listIndex][sumTotal])).ToArray();
EDIT - D'oh. For some reason .Select evaded me originally. That's a bit better. It's a slight hack because sumTotal is acting as both the input (the position in the array which is used in the Aggregate call) and the output sum in the resulting IEnumerable, which is counter-intuitive.
Frankly this is far more horrible than doing it the old-fasioned way :-)

Here is one that trades the Linq statement simplicity with performance.
var colSums =
from col in array.Pivot()
select col.Sum();
public static class LinqExtensions {
public static IEnumerable<IEnumerable<T>> Pivot<T>( this IList<T[]> array ) {
for( int c = 0; c < array[ 0 ].Length; c++ )
yield return PivotColumn( array, c );
}
private static IEnumerable<T> PivotColumn<T>( IList<T[]> array, int c ) {
for( int r = 0; r < array.Count; r++ )
yield return array[ r ][ c ];
}
}

I would do it as follows … but this solution might actually be very slow so you might want to run a benchmark before deploying it in performance-critical sections.
var result = xs.Aggregate(
(a, b) => Enumerable.Range(0, a.Length).Select(i => a[i] + b[i]).ToArray()
);

It can be done with Zip and Aggregate. The question is so old that probably Zip was not around at the time. Anyway, here is my version, hoping it will help someone.
List<int[]> myListOfIntArrays = PopulateListOfArraysOf100Ints();
int[] totals = new int[100];
int[] allArraysSum = myListOfIntArrays.Aggregate(
totals,
(arrCumul, arrItem) => arrCumul.Zip(arrItem, (a, b) => a + b))
.ToArray();

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to create sublists from a list with linq? - c#

This question is not a duplicate. As mentioned, I question a possible form using linq that would be more perfomatic. "For" ties with "range", for example, I am aware of. Thank you all for your collaboration, comments and possible solutions !!!

Related

Variable number of for loops without recursion but with Stack?

Repeat numbers in List/IEnumerable

How to split a List with different chunk sizes efficiently?

Array operations with n-dimensional array using LINQ (C#)

How do I sum a list<> of arrays

Categories

Resources