Algorithm/pattern for selecting sub-collections using LINQ and C#

Algorithm/pattern for selecting sub-collections using LINQ and C# - c#

I have a C# collection of strings. Each string is a sentence that can appear on a page. I also have a collection of page breaks which is a collection of int's. representing the index where the collection of strings are split to a new page.
Example: Each 10 items in the string collection is a page so the collection of page breaks would be a collection of int's with the values of 10, 20, 30. ...
So if there are 2 pages of strings then there will be 1 item in the page break collection and if there is 1 page then the page break collection would have zero items.
I am trying to create the following function:
List<string> GetPage(List<string> docList, List<int> pageBreakList, int pageNum)
{
// This function returns a subset of docList - just the page requested
}
I've taken a few stabs at writing this function and keep on coming up with complex if and switch statements to take into account single and two page documents and page numbers being requested outside the range (e.g. last page should be returned if page number is greater than number of pages and first page if page number is 0 or less).
My struggle with this problem leads me to ask the question: Is there a well known pattern or algorithm to address this type of subset query?

Not sure what the list of page breaks is for. I would think of it this way. A collection of strings, a page number, and the size of the page. Then you could do something like:
List<string> strings = ...
int pageNum = ...
int pageSze = ...
if (pageNum < 1) pageNum = 1;
if (pageSize < 1) pageSize = 1;
List<string> pageOfStrings = strings.Skip( pageSize*(pageNum-1) ).Take( pageSize ).ToList();
In the case where the number of pages vary per page as per your comment, try something like below. You may need to adjust the edge condition checking...
List<string> strings = ...
List<int> sizes = ...
int pageNum = ...
int itemsToSkip = 0;
int itemsToTake = 1;
if (pageNum > 1)
{
sizes.Take( pageNum - 2).Sum();
if (pageNum <= sizes.Count)
{
itemsToTake = sizes[pageNum-1]
}
{
List<string> pageOfStrings = strings.Skip( itemsToSkip ).Take( itemsToTake );

"Pure" Linq isn't a good fit for this problem. The best fit is to rely on the methods and properties of List(T). There aren't -that- many special cases.
//pageNum is zero-based.
List<string> GetPage(List<string> docList, List<int> pageBreaks, int pageNum)
{
// 0 page case
if (pageBreaks.Count != 0)
{
return docList;
}
int lastPage = pageBreaks.Count;
//requestedPage is after the lastPage case
if (requestedPage > lastPage)
{
requestedPage = lastPage;
}
int firstLine = requestedPage == 0 ? 0 :
pageBreaks[requestedPage-1];
int lastLine = requestedPage == lastPage ? docList.Count :
pageBreaks[requestedPage];
//lastLine is excluded. 6 - 3 = 3 - 3, 4, 5
int howManyLines = lastLine - firstLine;
return docList.GetRange(firstLine, howManyLines);
}
You don't want to replace the .Count property with linq's .Count() method.
You don't want to replace the .GetRange() method with linq's .Skip(n).Take(m) methods.
Linq would be a better fit if you wanted to project these collections into other collections:
IEnumerable<Page> pages =
Enumerable.Repeat(0, 1)
.Concat(pageBreaks)
.Select
(
(p, i) => new Page()
{
PageNumber = i,
Lines =
docList.GetRange(p, ((i != pageBreaks.Count) ? pageBreaks[i] : docList.Count) - p)
}
);

Related

Split a list into n equal parts

Given a sorted list, and a variable n, I want to break up the list into n parts. With n = 3, I expect three lists, with the last one taking on the overflow.
I expect: 0,1,2,3,4,5, 6,7,8,9,10,11, 12,13,14,15,16,17
If the number of items in the list is not divisible by n, then just put the overflow (mod n) in the last list.
This doesn't work:
static class Program
{
static void Main(string[] args)
{
var input = new List<double>();
for (int k = 0; k < 18; ++k)
{
input.Add(k);
}
var result = input.Split(3);
foreach (var resul in result)
{
foreach (var res in resul)
{
Console.WriteLine(res);
}
}
}
}
static class LinqExtensions
{
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> list, int parts)
{
int i = 0;
var splits = from item in list
group item by i++ % parts into part
select part.AsEnumerable();
return splits;
}
}

I think you would benefit from Linq's .Chunk() method.
If you first calculate how many parts will contain the equal item count, you can chunk list and yield return each chunk, before yield returning the remaining part of list (if list is not divisible by n).
As pointed out by Enigmativity, list should be materialized as an ICollection<T> to avoid possible multiple enumeration. The materialization can be obtained by trying to cast list to an ICollection<T>, and falling back to calling list.ToList() if that is unsuccessful.
A possible implementation of your extension method is hence:
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> list, int parts)
{
var collection = list is ICollection<T> c
? c
: list.ToList();
var itemCount = collection.Count;
// return all items if source list is too short to split up
if (itemCount < parts)
{
yield return collection;
yield break;
}
var itemsInEachChunk = itemCount / parts;
var chunks = itemCount % parts == 0
? parts
: parts - 1;
var itemsToChunk = chunks * itemsInEachChunk;
foreach (var chunk in collection.Take(itemsToChunk).Chunk(itemsInEachChunk))
{
yield return chunk;
}
if (itemsToChunk < itemCount)
{
yield return collection.Skip(itemsToChunk);
}
}
Example fiddle here.

I see two issues with your code. First, the way you're outputting the results, it's impossible to tell the groupings of the values since you're just outputing each one on its own line.
This could be resolved buy using Console.Write for each value in a group, and then adding a Console.WriteLine() when the group is done. This way the values from each group are displayed on a separate line. We also might want to pad the values so they line up nicely by getting the length of the largest value and passing that to the PadRight method:
static void Main(string[] args)
{
var numItems = 18;
var splitBy = 3;
var input = Enumerable.Range(0, numItems).ToList();
var results = input.Split(splitBy);
// Get the length of the largest value to use for padding smaller values,
// so all the columns will line up when we display the results
var padValue = input.Max().ToString().Length + 1;
foreach (var group in results)
{
foreach (var item in group)
{
Console.Write($"{item}".PadRight(padValue));
}
Console.WriteLine();
}
Console.Write("\n\nDone. Press any key to exit...");
Console.ReadKey();
}
Now your results look pretty good, except we can see that the numbers are not grouped as we expect:
0 3 6 9 12 15
1 4 7 10 13 16
2 5 8 11 14 17
The reason for this is that we're grouping by the remainder of each item divided by the number of parts. So, the first group contains all numbers whose remainder after being divided by 3 is 0, the second is all items whose remainder is 1, etc.
To resolve this, we should instead divide the index of the item by the number of items in a row (the number of columns).
In other words, 18 items divided by 3 rows will result in 6 items per row. With integer division, all the indexes from 0 to 5 will have a remainder of 0 when divided by 6, all the indexes from 6 to 11 will have a remainder of 1 when divided by 6, and all the indexes from 12 to 17 will have a remainder of 2 when divided by 6.
However, we also have to be able to handle the overflow numbers. One way to do this is to check if the index is greater than or equal to rows * columns (i.e. it would end up on a new row instead of on the last row). If this is true, then we set it to the last row.
I'm not great at linq so there may be a better way to write this, but we can modify our extension method like so:
public static IEnumerable<IEnumerable<T>> Split<T>(
this IEnumerable<T> list, int parts)
{
int numItems = list.Count();
int columns = numItems / parts;
int overflow = numItems % parts;
int index = 0;
return from item in list
group item by
index++ >= (parts * columns) ? parts - 1 : (index - 1) / columns
into part
select part.AsEnumerable();
}
And now our results look better:
// For 18 items split into 3
0 1 2 3 4 5
6 7 8 9 10 11
12 13 14 15 16 17
// For 25 items split into 7
0 1 2
3 4 5
6 7 8
9 10 11
12 13 14
15 16 17
18 19 20 21 22 23 24

This should work.
So each list should (ideally) have x/n elements,
where x=> No. of elements in the list &
n=> No. of lists it has to be split into
If x isn't divisible by n, then each list should have x/n (rounded down to the nearest integer). Let that no. be 'y'. While the last list should have x - y*(n - 1). Let that no. be 'z'.
What the first for-loop does is it repeats the process of creating a list with the appropriate no. of elements n times.
The if-else block is there to see if the list getting created is the last one or not. If it's the last one, it has z items. If not, it has y items.
The nested for-loops add the list items into "sub-lists" (List) which will then be added to the main list (List<List>) that is to be returned.
This solution is (noticeably) different from your signature and the other solutions offered. I used this approach because the code is (arguably) easier to understand albeit longer. When I used to look for solutions, I used to apply solutions where I could understand exactly what was going on. I wasn't able to fully understand the other solutions to this question (yet to get a proper hang of programming) so I presented the one I wrote below in case you were to end up in the same predicament.
Let me know if I should make any changes.
static class Program
{
static void Main(string[] args)
{
var input = new List<String>();
for (int k = 0; k < 18; ++k)
{
input.Add(k.ToString());
}
var result = SplitList(input, 5);//I've used 5 but it can be any number
foreach (var resul in result)
{
foreach (var res in result)
{
Console.WriteLine(res);
}
}
}
public static List<List<string>> SplitList (List<string> origList, int n)
{//"n" is the number of parts you want to split your list into
int splitLength = origList.Count / n; //splitLength is no. of items in each list bar the last one. (In case of overflow)
List<List<string>> listCollection = new List<List<string>>();
for ( int i = 0; i < n; i++ )
{
List<string> tempStrList = new List<string>();
if ( i < n - 1 )
{
for ( int j = i * splitLength; j < (i + 1) * splitLength; j++ )
{
tempStrList.Add(origList[j]);
}
}
else
{
for ( int j = i * splitLength; j < origList.Count; j++ )
{
tempStrList.Add(origList[j]);
}
}
listCollection.Add(tempStrList);
}
return listCollection;
}
}

How to Order By or Sort an integer List and select the Nth element

I have a list, and I want to select the fifth highest element from it:
List<int> list = new List<int>();
list.Add(2);
list.Add(18);
list.Add(21);
list.Add(10);
list.Add(20);
list.Add(80);
list.Add(23);
list.Add(81);
list.Add(27);
list.Add(85);
But OrderbyDescending is not working for this int list...

int fifth = list.OrderByDescending(x => x).Skip(4).First();

Depending on the severity of the list not having more than 5 elements you have 2 options.
If the list never should be over 5 i would catch it as an exception:
int fifth;
try
{
fifth = list.OrderByDescending(x => x).ElementAt(4);
}
catch (ArgumentOutOfRangeException)
{
//Handle the exception
}
If you expect that it will be less than 5 elements then you could leave it as default and check it for that.
int fifth = list.OrderByDescending(x => x).ElementAtOrDefault(4);
if (fifth == 0)
{
//handle default
}
This is still some what flawed because you could end up having the fifth element being 0. This can be solved by typecasting the list into a list of nullable ints at before the linq:
var newList = list.Select(i => (int?)i).ToList();
int? fifth = newList.OrderByDescending(x => x).ElementAtOrDefault(4);
if (fifth == null)
{
//handle default
}

Without LINQ expressions:
int result;
if(list != null && list.Count >= 5)
{
list.Sort();
result = list[list.Count - 5];
}
else // define behavior when list is null OR has less than 5 elements
This has a better performance compared to LINQ expressions, although the LINQ solutions presented in my second answer are comfortable and reliable.
In case you need extreme performance for a huge List of integers, I'd recommend a more specialized algorithm, like in Matthew Watson's answer.
Attention: The List gets modified when the Sort() method is called. If you don't want that, you must work with a copy of your list, like this:
List<int> copy = new List<int>(original);
List<int> copy = original.ToList();

The easiest way to do this is to just sort the data and take N items from the front. This is the recommended way for small data sets - anything more complicated is just not worth it otherwise.
However, for large data sets it can be a lot quicker to do what's known as a Partial Sort.
There are two main ways to do this: Use a heap, or use a specialised quicksort.
The article I linked describes how to use a heap. I shall present a partial sort below:
public static IList<T> PartialSort<T>(IList<T> data, int k) where T : IComparable<T>
{
int start = 0;
int end = data.Count - 1;
while (end > start)
{
var index = partition(data, start, end);
var rank = index + 1;
if (rank >= k)
{
end = index - 1;
}
else if ((index - start) > (end - index))
{
quickSort(data, index + 1, end);
end = index - 1;
}
else
{
quickSort(data, start, index - 1);
start = index + 1;
}
}
return data;
}
static int partition<T>(IList<T> lst, int start, int end) where T : IComparable<T>
{
T x = lst[start];
int i = start;
for (int j = start + 1; j <= end; j++)
{
if (lst[j].CompareTo(x) < 0) // Or "> 0" to reverse sort order.
{
i = i + 1;
swap(lst, i, j);
}
}
swap(lst, start, i);
return i;
}
static void swap<T>(IList<T> lst, int p, int q)
{
T temp = lst[p];
lst[p] = lst[q];
lst[q] = temp;
}
static void quickSort<T>(IList<T> lst, int start, int end) where T : IComparable<T>
{
if (start >= end)
return;
int index = partition(lst, start, end);
quickSort(lst, start, index - 1);
quickSort(lst, index + 1, end);
}
Then to access the 5th largest element in a list you could do this:
PartialSort(list, 5);
Console.WriteLine(list[4]);
For large data sets, a partial sort can be significantly faster than a full sort.
Addendum
See here for another (probably better) solution that uses a QuickSelect algorithm.

This LINQ approach retrieves the 5th biggest element OR throws an exception WHEN the list is null or contains less than 5 elements:
int fifth = list?.Count >= 5 ?
list.OrderByDescending(x => x).Take(5).Last() :
throw new Exception("list is null OR has not enough elements");
This one retrieves the 5th biggest element OR null WHEN the list is null or contains less than 5 elements:
int? fifth = list?.Count >= 5 ?
list.OrderByDescending(x => x).Take(5).Last() :
default(int?);
if(fifth == null) // define behavior
This one retrieves the 5th biggest element OR the smallest element WHEN the list contains less than 5 elements:
if(list == null || list.Count <= 0)
throw new Exception("Unable to retrieve Nth biggest element");
int fifth = list.OrderByDescending(x => x).Take(5).Last();
All these solutions are reliable, they should NEVER throw "unexpected" exceptions.
PS: I'm using .NET 4.7 in this answer.

Here there is a C# implementation of the QuickSelect algorithm to select the nth element in an unordered IList<>.
You have to put all the code contained in that page in a static class, like:
public static class QuickHelpers
{
// Put the code here
}
Given that "library" (in truth a big fat block of code), then you can:
int resA = list.QuickSelect(2, (x, y) => Comparer<int>.Default.Compare(y, x));
int resB = list.QuickSelect(list.Count - 1 - 2);
Now... Normally the QuickSelect would select the nth lowest element. We reverse it in two ways:
For resA we create a reverse comparer based on the default int comparer. We do this by reversing the parameters of the Compare method. Note that the index is 0 based. So there is a 0th, 1th, 2th and so on.
For resB we use the fact that the 0th element is the list-1 th element in the reverse order. So we count from the back. The highest element would be the list.Count - 1 in an ordered list, the next one list.Count - 1 - 1, then list.Count - 1 - 2 and so on
Theorically using Quicksort should be better than ordering the list and then picking the nth element, because ordering a list is on average a O(NlogN) operation and picking the nth element is then a O(1) operation, so the composite is O(NlogN) operation, while QuickSelect is on average a O(N) operation. Clearly there is a but. The O notation doesn't show the k factor... So a O(k1 * NlogN) with a small k1 could be better than a O(k2 * N) with a big k2. Only multiple real life benchmarks can tell us (you) what is better, and it depends on the size of the collection.
A small note about the algorithm:
As with quicksort, quickselect is generally implemented as an in-place algorithm, and beyond selecting the k'th element, it also partially sorts the data. See selection algorithm for further discussion of the connection with sorting.
So it modifies the ordering of the original list.

For loop with 20 items skip each time

I would like to write a piece of for loop which would go through an existing list and take 20 items out of that list each time it iterates.
So something like this:
If filteredList list contains let's say 68 items...
First 3 loops would take 20 times each and then in last 4th iteration it would take the rest of the 8 items that are residing in the list...
I have written something like this:
var allResponses= new List<string>();
for (int i = 0; i < filteredList.Count(); i++)
{
allResponses.Add(GetResponse(filteredList.Take(20).ToList()));
}
Where assuming filteredList is a list that contains 68 items. I figured that this is not a way to go because I don't want to loop to the collections size, but instead of 68 times, it should be 4 times and that I take 20 items out of the list each time... How could I do this?

You are pretty close - just add a call to Skip, and divide Count by 20 with rounding up:
var allResponses= new List<string>();
for (int i = 0; i < (filteredList.Count+19) / 20; i++) {
allResponses.Add(GetResponse(filteredList.Skip(i*20).Take(20).ToList()));
}
The "add 19, divide by 20" trick provides an idiomatic way of taking the "ceiling" of integer division, instead of the "floor".
Edit: Even better (Thanks to Thomas Ayoub)
var allResponses= new List<string>();
for (int i = 0 ; i < filteredList.Count ; i = i + 20) {
allResponses.Add(GetResponse(filteredList.Skip(i).Take(20).ToList()));
}

You simply have to calculate the number of pages:
const in PAGE_SIZE = 20;
int pages = filteredList.Count() / PAGE_SIZE
+ (filteredList.Count() % PAGE_SIZE > 0 ? 1 : 0)
;
The last part makes sure that with 21, there will be added 1 page above the previously calculated page size (since 21/20 = 1).
Or, when using MoreLINQ, you could simply use a Batch call.

I suggest a simple loop with page adding on 0, 20, 40... iterations without Linq and complex modular operations:
int pageSize = 20;
List<String> page = null;
for (int i = 0; i < filteredList.Count; ++i) {
// if page reach pageSize, add a new one
if (i % pageSize == 0) {
page = new List<String>(pageSize);
allResponses.Add(page);
}
page.Add(filteredList[i]);
}

i needed to do same but i needed to skip 10 after each element so i wrote this simple code
List<Location2D> points = new List<Location2D>();
points.ForEach(p =>
{
points.AddRange(path.Skip((int)points.Count * 10).Take(1));
});
if (!points.Contains(path.LastOrDefault()))
points.Add(path.LastOrDefault());

Loop Through Every Possible Combination of an Array

I am attempting to loop through every combination of an array in C# dependent on size, but not order. For example: var states = ["NJ", "AK", "NY"];
Some Combinations might be:
states = [];
states = ["NJ"];
states = ["NJ","NY"];
states = ["NY"];
states = ["NJ", "NY", "AK"];
and so on...
It is also true in my case that states = ["NJ","NY"] and states = ["NY","NJ"] are the same thing, as order does not matter.
Does anyone have any idea on the most efficient way to do this?

The combination of the following two methods should do what you want. The idea is that if the number of items is n then the number of subsets is 2^n. And if you iterate from 0 to 2^n - 1 and look at the numbers in binary you'll have one digit for each item and if the digit is 1 then you include the item, and if it is 0 you don't. I'm using BigInteger here as int would only work for a collection of less than 32 items and long would only work for less than 64.
public static IEnumerable<IEnumerable<T>> PowerSets<T>(this IList<T> set)
{
var totalSets = BigInteger.Pow(2, set.Count);
for (BigInteger i = 0; i < totalSets; i++)
{
yield return set.SubSet(i);
}
}
public static IEnumerable<T> SubSet<T>(this IList<T> set, BigInteger n)
{
for (int i = 0; i < set.Count && n > 0; i++)
{
if ((n & 1) == 1)
{
yield return set[i];
}
n = n >> 1;
}
}
With that the following code
var states = new[] { "NJ", "AK", "NY" };
foreach (var subset in states.PowerSets())
{
Console.WriteLine("[" + string.Join(",", subset.Select(s => "'" + s + "'")) + "]");
}
Will give you this output.
[]
['NJ']
['AK']
['NJ','AK']
['NY']
['NJ','NY']
['AK','NY']
['NJ','AK','NY']

You can use back-tracking where in each iteration you'll either (1) take the item in index i or (2) do not take the item in index i.
Pseudo code for this problem:
Main code:
Define a bool array (e.g. named picked) of the length of states
Call the backtracking method with index 0 (the first item)
Backtracking function:
Receives the states array, its length, the current index and the bool array
Halt condition - if the current index is equal to the length then you just need to iterate over the bool array and for each item which is true print the matching string from states
Actual backtracking:
Set picked[i] to true and call the backtracking function with the next index
Set picked[i] to false and call the backtracking function with the next index

List.IndexOf() - return index of final occurrence rather than the first?

int highestValue = someList.IndexOf(someList.Max())
someList contains a lot of duplicates and someList.Max() returns the index of the first instance of the highest value.
Is there some trickery I can use (reversing the order of the list?) to get the index of the final occurrence of the highest value in the list, rather than resorting to writing a manual method?

Try this:
int highestValue = someList.LastIndexOf(someList.Max()) ;

All the other answers being completely correct, it must be noted that this requires 2 iterations over the list (one to find the max element, second to find the last index). For a list of integers that's a non-issue, but if the iteration was more complicated, here's an alternative:
var highestValue = someList.Select((val, ind) => new { Value = val, Index = ind })
.Aggregate((x, y) => (x.Value > y.Value) ? x : y)
.Index;

You mean like getting the index of the last occurrence? That would be:
int highestValueIndex = someList.LastIndexOf(someList.Max())
You should, however, be aware of the fact that you're making two passes over the data in both your original code and the code above. If you want to do it in a single pass (and you should only worry about this if your data sets are large), you can do this with something like:
static int LastIndexOfMax(List<int> list)
{
// Empty list, no index.
if (list.Count == 0) return -1;
// Default to first element then check all others.
int maxIdx = 0, maxVal = list[0];
for (int idx = 1; idx < list.Count; ++idx) {
// Higher or equal-and-to-the-right, replace.
if (list[idx] >= maxVal) {
maxIdx = idx;
maxVal = list[idx];
}
}
return maxIdx;
}

Use LastIndexOf
int highestValue = someList.LastIndexOf(someList.Max());

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Algorithm/pattern for selecting sub-collections using LINQ and C# - c#

Related

Split a list into n equal parts

How to Order By or Sort an integer List and select the Nth element

For loop with 20 items skip each time

Loop Through Every Possible Combination of an Array

List.IndexOf() - return index of final occurrence rather than the first?

Categories

Resources