Find all Combinations of a sorted sequence of characters - c#

For the following characters a,b,c,d I want to find the following combinations.
The sequence is always sorted. I wonder how I should approach in finding the combinations?
a
b
c
d
ab
ac
ad
bc
bd
cd
abc
abd
acd
bcd
abcd

What you want is every single Combination. Normally when getting combinations you get all combinations of a particular size, n. We'll start out by creating that method to get the combinations of size n from a sequence:
public static IEnumerable<IEnumerable<T>> Combinations<T>(
this IEnumerable<T> source, int n)
{
if (n == 0)
yield return Enumerable.Empty<T>();
int count = 1;
foreach (T item in source)
{
foreach (var innerSequence in source.Skip(count).Combinations(n - 1))
{
yield return new T[] { item }.Concat(innerSequence);
}
count++;
}
}
Once you have that it's a simple matter of getting the combinations of n for all n from 1 to the size of the sequence:
public static IEnumerable<IEnumerable<T>> AllCombinations<T>(this IList<T> source)
{
IEnumerable<IEnumerable<T>> output = Enumerable.Empty<IEnumerable<T>>();
for (int i = 0; i < source.Count; i++)
{
output = output.Concat(source.Combinations(i));
}
return output;
}
Some sample code that uses it:
var list = new List<string> { "a", "b", "c", "d" };
foreach (var sequence in list.AllCombinations())
{
Console.WriteLine(string.Join(" ", sequence));
}
It's worth noting that this operation is extraordinarily expensive for all but the tiniest input sequences. It's not exactly the most efficient around, but even if you do eek out every last bit of performance you won't be able to compute the combinations of sequences of more than 15-20, depending on how long you're willing to wait and how good your computer is.

You can use the Combinatorics library to calculate them for you (documentation), but as Servy said, length of the data is a major factor in how long it will take.

I have written a class to handle common functions for working with the binomial coefficient, which is the type of problem that your problem falls under. I have not looked at the Cominatorics library suggested by #Bobson, but I believe my class is probably much faster and more efficient. It performs the following tasks:
Outputs all the K-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters.
Converts the K-indexes to the proper index of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration. It does this by using a mathematical property inherent in Pascal's Triangle. My paper talks about this. I believe I am the first to discover and publish this technique, but I could be wrong.
Converts the index in a sorted binomial coefficient table to the corresponding K-indexes. I believe it might be faster than the link you have found.
Uses Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers.
The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to perform the 4 above methods. Accessor methods are provided to access the table.
There is an associated test class which shows how to use the class and its methods. It has been extensively tested with 2 cases and there are no known bugs.
To read about this class and download the code, see Tablizing The Binomial Coeffieicent.
The solution to your problem involves generating the K-indexes for each N choose K case. So in your example above where there are 4 possibilities for N (A, B, C, D) the code (in C#) would look something like this:
int TotalNumberOfValuesInSet = 4;
int N = TotalNumberOfValuesInSet;
// Loop thru all the possible groups of combinations.
for (int K = N - 1; K < N; K++)
{
// Create the bin coeff object required to get all
// the combos for this N choose K combination.
BinCoeff<int> BC = new BinCoeff<int>(N, K, false);
int NumCombos = BinCoeff<int>.GetBinCoeff(N, K);
int[] KIndexes = new int[K];
// Loop thru all the combinations for this N choose K case.
for (int Combo = 0; Combo < NumCombos; Combo++)
{
// Get the k-indexes for this combination, which in this case
// are the indexes to each letter in the set starting with index zero.
BC.GetKIndexes(Loop, KIndexes);
// Do whatever processing that needs to be done with the indicies in KIndexes.
...
}
// Handle the final combination which in this case is ABCD since since K < N.
...
}

They're not truly subsets because there's nothing to stop your input sequence from containing duplicates, but the following extension method should work in the general case:
public static IEnumerable<IEnumerable<T>> Subsets<T>(this IEnumerable<T> source)
{
List<T[]> yielded = new List<T[]> { new T[0] };
foreach(T t in source)
{
List<T[]> newlyYielded = new List<T[]>();
foreach(var y in yielded)
{
var newSubset = y.Concat(new[] {t}).ToArray();
newlyYielded.Add(newSubset);
yield return newSubset;
}
yielded.AddRange(newlyYielded);
}
}
Basically starting with an empty sequence, it adds the empty sequence with the first item appended. Then for each of those two sequences, it adds that sequence with the next item appended. Then for each of those four sequences...
This has to keep a copy of each sequence generated, so will use a lot of memory.
To get strings out of string, you can call this as
"abcd".Subsets().Select(chars => new string(chars.ToArray()))
If you're not going to have many characters, you could take advantage of the fact that you can calculate the nth subset directly:
public static int SubsetCount(this string s)
{
return 2 << s.Length;
}
public static string NthSubset(this string s, int n)
{
var b = New StringBuilder();
int i = 0;
while (n > 0)
{
if ((n&1)==1) b.Append(s[i]);
i++;
n >>= 1;
}
return b.ToString();
}

The code of servy above is quite elegant, but it doesn't produce those combinations that have the same length as the source.
for (int i = 0; i < source.Count; i++) should be
for (int i = 0; i <= source.Count; i++).
Below is the vb.net variant, which can't use yield.
<Extension()>
Public Function Combinations(Of T)(source As IEnumerable(Of T), n As Integer) As IEnumerable(Of IEnumerable(Of T))
Dim lstResults As New List(Of IEnumerable(Of T))
If n = 0 Then
lstResults.Add(Enumerable.Empty(Of T))
Else
Dim count As Integer = 1
For Each item As T In source
For Each innerSequence In source.Skip(count).Combinations(n - 1)
lstResults.Add(New T() {item}.Concat(innerSequence))
Next
count += 1
Next
End If
Return lstResults
End Function
<Extension()>
Public Function AllCombinations(Of T)(source As IList(Of T)) As IEnumerable(Of IEnumerable(Of T))
Dim output As IEnumerable(Of IEnumerable(Of T)) = Enumerable.Empty(Of IEnumerable(Of T))()
For i As Integer = 0 To source.Count
output = output.Concat(source.Combinations(i))
Next
Return output
End Function

Related

Every combination of integers between specific range in combination with a list of strings

I have the following problem:
I have three elements a, b and c. And also integers from 0 to 100. How can I get all the possible combinations to look like:
a 0 b 0 c 0
a 1 b 0 c 0
a 0 b 1 c 0
a 0 b 0 c 1
a 1 b 1 c 0
...
a 100 b 100 c 100
and so on? I am using C# but I am rather struggling to find the correct algorithm independently of programming language. Unfortunately I do not really understand carthesian products etc.
You say you want to
find the correct algorithm independently of programming language
So I shall try to answer this using the minimum of programming language features. The example I shall give assumes the programming language has expandable lists, arrays, arrays of arrays and the ability to shallow clone an array. These are common programming features, so hopefully this will be OK.
To solve this problem, you need to produce all the combinations of 3 sets of N integers where each set consists the integers from 0..N-1. (The set of combinations of a set of sets - which is what this is - is called the Cartesian Product of those sets.)
The solution below uses recursion, but we don't need to worry about stack overflow because the stack depth does not exceed the number of sets to combine - in this case, 3. (Normally with recursion you would try to use a stack class to manage it, but that makes the code more complicated.)
How it works:
combine() recursively iterates through all elements of each set, and at each level of recursion it begins processing the elements of the next set.
So the outer level of recursion begins iterating over all the elements of set[0], and for each element it fills in the next item of the current combination with that element.
Then: if that was the last set, the combination is complete and it is output. Otherwise: a recursive call is made to start filling in the elements from the next set.
Once we have all the combinations, we can just iterate through them and intersperse them
with a, b and c as per your requirement.
Putting this together:
using System;
using System.Collections.Generic;
namespace ConsoleApp1
{
class Program
{
static void Main()
{
var sets = createSets(3, 10);
var combinations = Combinations(sets);
foreach (var combination in combinations)
{
Console.WriteLine($"a {combination[0]} b {combination[1]} c {combination[2]}");
}
}
static int[][] createSets(int numSets, int intsPerSet)
{
int[][] sets = new int[numSets][];
// All the sets are the same, so we can just use copies of it rather than create multiples.
int[] oneSet = new int[intsPerSet];
for (int i = 0; i < intsPerSet; ++i)
oneSet[i] = i;
for (int i = 0; i < numSets; ++i)
sets[i] = oneSet;
return sets;
}
public static List<int[]> Combinations(int[][] sets)
{
var result = new List<int[]>();
combine(sets, 0, new int[sets.Length], result);
return result;
}
static void combine(int[][] sets, int set, int[] combination, List<int[]> output)
{
for (int i = 0; i < sets[set].Length; ++i)
{
combination[set] = sets[set][i];
if (set < (sets.Length - 1))
combine(sets, set + 1, combination, output);
else
output.Add((int[])combination.Clone());
}
}
}
}
Notes
This is an inefficient implementation because it returns all the combinations in one huge list. I kept it this way for simplicity (and to reduce the number of program language features required for its implementation). A better solution in C# would be to pass in an Action<int[]> to be called with each combination - then the results wouldn't need to be returned via a huge list.
This doesn't produce the results in the same order as your sample output. I have assumed that this doesn't matter!
A great Linq implementation of the Cartesian Product is presented by Eric Lippert here. I highly recommending reading it!
If the order of the output doesn't matter, this should be enough:
for(int i = 0; i <= 100; i++){
for(int j = 0; j <= 100; j++){
for(int k = 0; k <= 100; k++){
Console.WriteLine($"a {i} b {j} c {k} ");
}
}
}
OUTPUT
a 0 b 0 c 0
a 0 b 0 c 1
a 0 b 0 c 2
a 0 b 0 c 3
...
a 100 b 100 c 100

How to achieve O(n) worst-case time complexity for this function?

I'm having issues with a certain task. It's not a homework or anything, it's rather a personal matter now. And I want to know if there's even a solution for this...
The point is to achieve expected O(n) worst-case time complexity of a function, that takes 2 string arrays as input (let's call first one A, and the second array B) and should return an array of integers where each element represents an index of the corresponding element in array A.
So, this is how a function should look like:
private static int[] GetExistingStrings(string[] A, string[] B) { ... }
Array A contains all possible names
Array B contains names which should be excluded (i.e. if some of the names stored in B array are also in the A array, their indices should not be included in an output int[] array; it's also possible that this array can contain some random strings which are not necessarily may present in the A array OR it may even be empty.
For example, if we have these arrays:
string[] A = { "one", "two", "three", "four" }; // 0, 1, 2, 3
string[] B = { "two", "three" }; // Indices of "two" and "three" not taken into account
The function should return:
int[] result = { 0, 3 }; // Indices of "one" and "four"
At first, I tried doing it the obvious and simple way (with nested for-loops):
private static int[] GetExistingStrings(string[] A, string[] B)
{
LinkedList<int> aIndices = new LinkedList<int>();
for (int n = 0; n < A.Length; n++)
{
bool isExcluded = false;
for (int m = 0; m < B.Length; m++)
{
if (A[n].Equals(B[m]))
{
isExcluded = true;
break;
}
}
if (!isExcluded)
{
aIndices.AddLast(i);
}
}
int[] resultArray = new int[aIndices.Count];
aIndices.CopyTo(resultArray, 0);
return resultArray;
}
I used LinkedList because we can't possibly know what the ouput's array size should be and also because adding new nodes to this list is a constant O(1) operation. The problem here, of course, is that this function (as I assume) is O(n*M) time complexity. So, we need to find another way...
My second approach was:
private static int[] GetExistingStrings(string[] A, string[] B)
{
int n = A.Length;
int m = B.Length;
if (m == 0)
{
return GetDefaultOutputArray(n);
}
HashSet<string> bSet = new HashSet<string>(B);
LinkedList<int> aIndices = new LinkedList<int>();
for (int i = 0; i < n; i++)
{
if (!bSet.Contains(A[i]))
{
aIndices.AddLast(i);
}
}
if (aIndices.Count > 0)
{
int[] result = new int[aIndices.Count];
aIndices.CopyTo(result, 0);
return result;
}
return GetDefaultOutputArray(n);
}
// Just an utility function that returns a default array
// with length "arrayLength", where first element is 0, next one is 1 and so on...
private static int[] GetDefaultOutputArray(int arrayLength)
{
int[] array = new int[arrayLength];
for (int i = 0; i < arrayLength; i++)
{
array[i] = i;
}
return array;
}
Here the idea was to add all elements of B array to a HashSet and then use it's method Contains() to check for equality in a for-loop. But I can't quite calculate time complexity of this function... I know for sure that the code in the for-loop will execute n times. But what bugs me the most is the HashSet initialization - should it be taken into account here? How does it affects time complexity? is this function O(n)? Or O(n+m) because of HashSet initialization?
Is there any way to solve this task and achieve O(n)?
If you have n elements in A, m elements in B, and the strings are of length k, the expected time of a hashmap approach is O(k*(m + n)). Unfortunately the worst time is O(km(m + n)) if the hashing algorithm doesn't work. (The odds of which are very low.) I had this wrong before, thanks to #PaulHankin for the correction.
To get O(k*(m + n)) worst time we have to take a very different approach. What you do is build a trie out of B. And now you go through each element of A and look it up in the trie. Unlike a hash, a trie has guaranteed worst case performance (and better yet, allows prefix lookups even though we aren't using that). This approach gives us not just expected average time O(k*(m + n)) but also the same worst time.
You cannot do better than this because just processing the lists requires processing O(k*(m + n)) data.
Here is how you could rewrite your second approach using LINQ, while also selecting case-insensitive string comparison:
public static int[] GetExistingStrings(string[] first, string[] second)
{
var secondSet = new HashSet<string>(second, StringComparer.OrdinalIgnoreCase);
return first
.Select((e, i) => (Element : e, Index : i))
.Where(p => !secondSet.Contains(p.Element))
.Select(p => p.Index)
.ToArray();
}
The time and space complexity is the same (O(n)). It's just a more fancy way to do the same thing.

Variable number of for loops without recursion but with Stack?

I know the usual approach for "variable number of for loops" is said to use a recursive method. But I wonder if I could solve that without recursion and instead with using Stack, since you can bypass recursion with the use of a stack.
My example:
I have a variable number of collections and I need to combine every item of every collection with every other item of the other collections.
// example for collections A, B and C:
A (4 items) + B (8 items) + C (10 items)
4 * 8 * 10 = 320 combinations
I need to run through all those 320 combinations. Yet at compile time I don't know if B or C or D exist. How would a solution with no recursive method but with the use of an instance of Stack look like?
Edit:
I realized Stack is not necessary here at all, while you can avoid recursion with a simple int array and a few while loops. Thanks for help and info.
Not with a stack but without recursion.
void Main()
{
var l = new List<List<int>>()
{
new List<int>(){ 1,2,3 },
new List<int>(){ 4,5,6 },
new List<int>(){ 7,8,9 }
};
var result = CartesianProduct(l);
}
static IEnumerable<IEnumerable<T>> CartesianProduct<T>(IEnumerable<IEnumerable<T>> sequences)
{
IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>()};
return sequences.Aggregate(
emptyProduct,
(accumulator, sequence) =>
from accseq in accumulator
from item in sequence
select accseq.Concat(new[] {item})
);
}
Function taken form Computing a Cartesian Product with LINQ
Here is an example of how to do this. Algorithm is taken from this question - https://stackoverflow.com/a/2419399/5311735 and converted to C#. Note that it can be made more efficient, but I converted inefficient version to C# because it's better illustrates the concept (you can see more efficient version in the linked question):
static IEnumerable<T[]> CartesianProduct<T>(IList<IList<T>> collections) {
// this contains the indexes of elements from each collection to combine next
var indexes = new int[collections.Count];
bool done = false;
while (!done) {
// initialize array for next combination
var nextProduct = new T[collections.Count];
// fill it
for (int i = 0; i < collections.Count; i++) {
var collection = collections[i];
nextProduct[i] = collection[indexes[i]];
}
yield return nextProduct;
// now we need to calculate indexes for the next combination
// for that, increase last index by one, until it becomes equal to the length of last collection
// then increase second last index by one until it becomes equal to the length of second last collection
// and so on - basically the same how you would do with regular numbers - 09 + 1 = 10, 099 + 1 = 100 and so on.
var j = collections.Count - 1;
while (true) {
indexes[j]++;
if (indexes[j] < collections[j].Count) {
break;
}
indexes[j] = 0;
j--;
if (j < 0) {
done = true;
break;
}
}
}
}

Most efficient sorting algorithm for sorted sub-sequences

I have several sorted sequences of numbers of type long (ascending order) and want to generate one master sequence that contains all elements in the same order. I look for the most efficient sorting algorithm to solve this problem. I target C#, .Net 4.0 and thus also welcome ideas targeting parallelism.
Here is an example:
s1 = 1,2,3,5,7,13
s2 = 2,3,6
s3 = 4,5,6,7,8
resulting Sequence = 1,2,2,3,3,4,5,5,6,6,7,7,8,13
Edit: When there are two (or more) identical values then the order of those two (or more) does not matter.
Just merge the sequences. You do not have to sort them again.
There is no .NET Framework method that I know of to do a K-way merge. Typically, it's done with a priority queue (often a heap). It's not difficult to do, and it's quite efficient. Given K sorted lists, together holding N items, the complexity is O(N log K).
I show a simple binary heap class in my article A Generic Binary Heap Class. In Sorting a Large Text File, I walk through the creation of multiple sorted sub-files and using the heap to do the K-way merge. Given an hour (perhaps less) of study, and you can probably adapt that to use in your program.
You just have to merge your sequences like in a merge sort.
And this is parallelizable:
merge sequences (1 and 2 in 1/2), (3 and 4 in 3/4), …
merge sequences (1/2 and 3/4 in 1/2/3/4), (5/6 and 7/8 in 5/6/7/8), …
…
Here is the merge function :
int j = 0;
int k = 0;
for(int i = 0; i < size_merged_seq; i++)
{
if (j < size_seq1 && seq1[j] < seq2[k])
{
merged_seq[i] = seq1[j];
j++;
}
else
{
merged_seq[i] = seq2[k];
k++;
}
}
Easy way is to merge them with each other one by one. However, this will require O(n*k^2) time, where k is number of sequences and n is the average number of items in sequences. However, using divide and conquer approach you can lower this time to O(n*k*log k). The algorithm is as follows:
Divide k sequences to k/2 groups, each of 2 elements (and 1 groups of 1 element if k is odd).
Merge sequences in each group. Thus you will get k/2 new groups.
Repeat until you get single sequence.
UPDATE:
Turns out that with all the algorithms... It's still faster the simple way:
private static List<T> MergeSorted<T>(IEnumerable<IEnumerable<T>> sortedBunches)
{
var list = sortedBunches.SelectMany(bunch => bunch).ToList();
list.Sort();
return list;
}
And for legacy purposes...
Here is the final version by prioritizing:
private static IEnumerable<T> MergeSorted<T>(IEnumerable<IEnumerable<T>> sortedInts) where T : IComparable<T>
{
var enumerators = new List<IEnumerator<T>>(sortedInts.Select(ints => ints.GetEnumerator()).Where(e => e.MoveNext()));
enumerators.Sort((e1, e2) => e1.Current.CompareTo(e2.Current));
while (enumerators.Count > 1)
{
yield return enumerators[0].Current;
if (enumerators[0].MoveNext())
{
if (enumerators[0].Current.CompareTo(enumerators[1].Current) == 1)
{
var tmp = enumerators[0];
enumerators[0] = enumerators[1];
enumerators[1] = tmp;
}
}
else
{
enumerators.RemoveAt(0);
}
}
do
{
yield return enumerators[0].Current;
} while (enumerators[0].MoveNext());
}

Calculating Nth permutation step?

I have a char[26] of the letters a-z and via nested for statements I'm producing a list of sequences like:
aaa, aaz... aba, abb, abz, ... zzy, zzz.
Currently, the software is written to generate the list of all possible values from aaa-zzz and then maintains an index, and goes through each of them performing an operation on them.
The list is obviously large, it's not ridiculously large, but it's gotten to the point where the memory footprint is too large (there are also other areas being looked at, but this is one that I've got).
I'm trying to produce a formula where I can keep the index, but do away with the list of sequences and calculate the current sequence based on the current index (as the time between operations between sequences is long).
Eg:
char[] characters = {a, b, c... z};
int currentIndex = 29; // abd
public string CurrentSequence(int currentIndex)
{
int ndx1 = getIndex1(currentIndex); // = 0
int ndx2 = getIndex2(currentIndex); // = 1
int ndx3 = getIndex3(currentIndex); // = 3
return string.Format(
"{0}{1}{2}",
characters[ndx1],
characters[ndx2],
characters[ndx3]); // abd
}
I've tried working out a small example using a subset (abc) and trying to index into that using modulo division, but I'm having trouble thinking today and I'm stumped.
I'm not asking for an answer, just any kind of help. Maybe a kick in the right direction?
Hint: Think of how you would print a number in base 26 instead of base 10, and with letters instead of digits. What's the general algorithm for displaying a number in an arbitrary base?
Spoiler: (scroll right to view)
int ndx1 = currentIndex / 26 / 26 % 26;
int ndx2 = currentIndex / 26 % 26;
int ndx3 = currentIndex % 26;
Something like this ought to work, assuming 26 characters:
public string CurrentSequence(int currentIndex) {
return characters[currentIndex / (26 * 26)]
+ characters[(currentIndex / 26) % 26]
+ characters[currentIndex % 26];
}
Wow, two questions in one day that can be solved via Cartesian products. Amazing.
You can use Eric Lippert's LINQ snippet to generate all combinations of the index values. This approach results in a streaming set of values, so they don't require storage in memory. This approach nicely separates the logic of generating the codes from maintaining state or performing computation with the code.
Eric's code for all combinations:
static IEnumerable<IEnumerable<T>> CartesianProduct<T>(this IEnumerable<IEnumerable<T>> sequences)
{
IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>() };
return sequences.Aggregate(
emptyProduct,
(accumulator, sequence) =>
from accseq in accumulator
from item in sequence
select accseq.Concat(new[] {item}));
}
You can now write:
public static IEnumerable<string> AllCodes()
{
char[] characters = {a, b, c... z};
IEnumerable<char[]> codeSets = new[] { characters, characters, characters };
foreach( var codeValues in codeSets.CartesianProduct() )
{
yield return
string.Format( "{0}{1}{2}", codeValues[0], codeValues[1], codeValues[2]);
}
}
The code above generates a streaming sequence of all code strings from aaa to zzz. You can now use this elsewhere where you perform your processing:
foreach( var code in AllCodes() )
{
// use the code value somehow...
}
There are multiple ways to solve your problem, but an option is to generate the sequence on the fly instead of storing it in a list:
IEnumerable<String> Sequence() {
for (var c1 = 'a'; c1 <= 'z'; ++c1)
for (var c2 = 'a'; c2 <= 'z'; ++c2)
for (var c3 = 'a'; c3 <= 'z'; ++c3)
yield return String.Format("{0}{1}{2}", c1, c2, c3);
}
You can then enumerate all the strings:
foreach (var s in Sequence())
Console.WriteLine(s);
This code doesn't use indices at all but it allows you to create a loop around the sequence of strings using simple code and without storing the strings.

Categories

Resources