I have a char[26] of the letters a-z and via nested for statements I'm producing a list of sequences like:
aaa, aaz... aba, abb, abz, ... zzy, zzz.
Currently, the software is written to generate the list of all possible values from aaa-zzz and then maintains an index, and goes through each of them performing an operation on them.
The list is obviously large, it's not ridiculously large, but it's gotten to the point where the memory footprint is too large (there are also other areas being looked at, but this is one that I've got).
I'm trying to produce a formula where I can keep the index, but do away with the list of sequences and calculate the current sequence based on the current index (as the time between operations between sequences is long).
Eg:
char[] characters = {a, b, c... z};
int currentIndex = 29; // abd
public string CurrentSequence(int currentIndex)
{
int ndx1 = getIndex1(currentIndex); // = 0
int ndx2 = getIndex2(currentIndex); // = 1
int ndx3 = getIndex3(currentIndex); // = 3
return string.Format(
"{0}{1}{2}",
characters[ndx1],
characters[ndx2],
characters[ndx3]); // abd
}
I've tried working out a small example using a subset (abc) and trying to index into that using modulo division, but I'm having trouble thinking today and I'm stumped.
I'm not asking for an answer, just any kind of help. Maybe a kick in the right direction?
Hint: Think of how you would print a number in base 26 instead of base 10, and with letters instead of digits. What's the general algorithm for displaying a number in an arbitrary base?
Spoiler: (scroll right to view)
int ndx1 = currentIndex / 26 / 26 % 26;
int ndx2 = currentIndex / 26 % 26;
int ndx3 = currentIndex % 26;
Something like this ought to work, assuming 26 characters:
public string CurrentSequence(int currentIndex) {
return characters[currentIndex / (26 * 26)]
+ characters[(currentIndex / 26) % 26]
+ characters[currentIndex % 26];
}
Wow, two questions in one day that can be solved via Cartesian products. Amazing.
You can use Eric Lippert's LINQ snippet to generate all combinations of the index values. This approach results in a streaming set of values, so they don't require storage in memory. This approach nicely separates the logic of generating the codes from maintaining state or performing computation with the code.
Eric's code for all combinations:
static IEnumerable<IEnumerable<T>> CartesianProduct<T>(this IEnumerable<IEnumerable<T>> sequences)
{
IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>() };
return sequences.Aggregate(
emptyProduct,
(accumulator, sequence) =>
from accseq in accumulator
from item in sequence
select accseq.Concat(new[] {item}));
}
You can now write:
public static IEnumerable<string> AllCodes()
{
char[] characters = {a, b, c... z};
IEnumerable<char[]> codeSets = new[] { characters, characters, characters };
foreach( var codeValues in codeSets.CartesianProduct() )
{
yield return
string.Format( "{0}{1}{2}", codeValues[0], codeValues[1], codeValues[2]);
}
}
The code above generates a streaming sequence of all code strings from aaa to zzz. You can now use this elsewhere where you perform your processing:
foreach( var code in AllCodes() )
{
// use the code value somehow...
}
There are multiple ways to solve your problem, but an option is to generate the sequence on the fly instead of storing it in a list:
IEnumerable<String> Sequence() {
for (var c1 = 'a'; c1 <= 'z'; ++c1)
for (var c2 = 'a'; c2 <= 'z'; ++c2)
for (var c3 = 'a'; c3 <= 'z'; ++c3)
yield return String.Format("{0}{1}{2}", c1, c2, c3);
}
You can then enumerate all the strings:
foreach (var s in Sequence())
Console.WriteLine(s);
This code doesn't use indices at all but it allows you to create a loop around the sequence of strings using simple code and without storing the strings.
Related
I have searched and found very many algorithms in this topic but have not found one that fits this. I have also not found anyone I managed to change so my problem is resolved.
I need a function that takes a List and then returns a List with a List of all these combinations. The lists should be combinations with all objects down to only one lone object.
Example:
fun(new List<obj> {objA, objB, objC});
Should return
public List<List<obj>> fun(List<obj> L){
...
return List{
List{objA},
List{objB},
List{objC},
List{objA, objB},
List{objA, objC},
List{objB, objC},
List{objA, objB, objC};
}
And I do not know in advance how long the list will be.
I know the mathematical expression
n! / k! (n-k)! + n! / (k-1)! (n-(k-1))! + ... + n! / 1! (n-1)!
Where n is the number of available objects and k is the number of them you want to combine.
The result of this calculation will be the number of List< obj > that will be included in the returned List
But, as I said, I have not managed to get something sensible in code.
I use c# so I prefer answers in this language. But all help is welcome.
I have looked at so many algorithms that I now get more confused than it helps.
You can use an integral value to count up through all the combinations.
Then, for each value, check each bit in the number. Each 1 bit means that the corresponding item is included in the combination.
If you think about how binary numbers work, you'll understand how this algorithm works. For example, for 3 items, you will have a 3-bit binary number that goes from 001 to 111 with each of the 3 bits corresponding to one of the items, like so:
001
010
011
100
101
110
111
You should be able to see how we can use each bit to decide whether or not the corresponding item is included in that combination.
Here's a sample implementation - this works if the number of items is <= 32:
public static IEnumerable<IEnumerable<T>> Combinations<T>(IList<T> items)
{
return Combinations(items.Count).Select(comb => comb.Select(index => items[index]));
}
public static IEnumerable<IEnumerable<int>> Combinations(int n)
{
long m = 1 << n;
for (long i = 1; i < m; ++i)
yield return bitIndices((uint)i);
}
static IEnumerable<int> bitIndices(uint n)
{
uint mask = 1;
for (int bit = 0; bit < 32; ++bit, mask <<= 1)
if ((n & mask) != 0)
yield return bit;
}
You can test this with, for example, a list of characters A..E:
IList<char> test = "ABCDE".ToList();
foreach (var comb in Combinations(test))
Console.WriteLine(string.Concat(comb));
This outputs:
A
B
AB
C
AC
BC
ABC
D
AD
BD
ABD
CD
ACD
BCD
ABCD
E
AE
BE
ABE
CE
ACE
BCE
ABCE
DE
ADE
BDE
ABDE
CDE
ACDE
BCDE
ABCDE
If you want to turn the IEnumerable<IEnumerable<T>> into a List<List<T>>, just do the following:
List<List<T>> list = Combinations(inputList).Select(x => x.ToList()).ToList();
For example, for the List<char> above do this:
List<List<char>> list = Combinations(test).Select(x => x.ToList()).ToList();
I'm not sure about performance, but from a readability point of view this is the best I came up with - using a couple of nested loops and linq's Skip and Take:
var source = new List<int>() { 1, 2, 3 };
var target = new List<List<int>>();
for(var i = 0; i < source.Count; i++)
{
for(var j = i; j < source.Count; j++)
{
target.Add(new List<int>(source.Skip(i).Take(source.Count - j)));
}
}
You can see a live demo on rextester
I've been trying to solve this interview problem which asks to shuffle a string so that no two adjacent letters are identical
For example,
ABCC -> ACBC
The approach I'm thinking of is to
1) Iterate over the input string and store the (letter, frequency)
pairs in some collection
2) Now build a result string by pulling the highest frequency (that is > 0) letter that we didn't just pull
3) Update (decrement) the frequency whenever we pull a letter
4) return the result string if all letters have zero frequency
5) return error if we're left with only one letter with frequency greater than 1
With this approach we can save the more precious (less frequent) letters for last. But for this to work, we need a collection that lets us efficiently query a key and at the same time efficiently sort it by values. Something like this would work except we need to keep the collection sorted after every letter retrieval.
I'm assuming Unicode characters.
Any ideas on what collection to use? Or an alternative approach?
You can sort the letters by frequency, split the sorted list in half, and construct the output by taking letters from the two halves in turn. This takes a single sort.
Example:
Initial string: ACABBACAB
Sort: AAAABBBCC
Split: AAAA+BBBCC
Combine: ABABABCAC
If the number of letters of highest frequency exceeds half the length of the string, the problem has no solution.
Why not use two Data Structures: One for sorting (Like a Heap) and one for key retrieval, like a Dictionary?
The accepted answer may produce a correct result, but is likely not the 'correct' answer to this interview brain teaser, nor the most efficient algorithm.
The simple answer is to take the premise of a basic sorting algorithm and alter the looping predicate to check for adjacency rather than magnitude. This ensures that the 'sorting' operation is the only step required, and (like all good sorting algorithms) does the least amount of work possible.
Below is a c# example akin to insertion sort for simplicity (though many sorting algorithm could be similarly adjusted):
string NonAdjacencySort(string stringInput)
{
var input = stringInput.ToCharArray();
for(var i = 0; i < input.Length; i++)
{
var j = i;
while(j > 0 && j < input.Length - 1 &&
(input[j+1] == input[j] || input[j-1] == input[j]))
{
var tmp = input[j];
input[j] = input[j-1];
input[j-1] = tmp;
j--;
}
if(input[1] == input[0])
{
var tmp = input[0];
input[0] = input[input.Length-1];
input[input.Length-1] = tmp;
}
}
return new string(input);
}
The major change to standard insertion sort is that the function has to both look ahead and behind, and therefore needs to wrap around to the last index.
A final point is that this type of algorithm fails gracefully, providing a result with the fewest consecutive characters (grouped at the front).
Since I somehow got convinced to expand an off-hand comment into a full algorithm, I'll write it out as an answer, which must be more readable than a series of uneditable comments.
The algorithm is pretty simple, actually. It's based on the observation that if we sort the string and then divide it into two equal-length halves, plus the middle character if the string has odd length, then corresponding positions in the two halves must differ from each other, unless there is no solution. That's easy to see: if the two characters are the same, then so are all the characters between them, which totals ⌈n/2⌉+1 characters. But a solution is only possible if there are no more than ⌈n/2⌉ instances of any single character.
So we can proceed as follows:
Sort the string.
If the string's length is odd, output the middle character.
Divide the string (minus its middle character if the length is odd) into two equal-length halves, and interleave the two halves.
At each point in the interleaving, since the pair of characters differ from each other (see above), at least one of them must differ from the last character output. So we first output that character and then the corresponding one from the other half.
The sample code below is in C++, since I don't have a C# environment handy to test with. It's also simplified in two ways, both of which would be easy enough to fix at the cost of obscuring the algorithm:
If at some point in the interleaving, the algorithm encounters a pair of identical characters, it should stop and report failure. But in the sample implementation below, which has an overly simple interface, there's no way to report failure. If there is no solution, the function below returns an incorrect solution.
The OP suggests that the algorithm should work with Unicode characters, but the complexity of correctly handling multibyte encodings didn't seem to add anything useful to explain the algorithm. So I just used single-byte characters. (In C# and certain implementations of C++, there is no character type wide enough to hold a Unicode code point, so astral plane characters must be represented with a surrogate pair.)
#include <algorithm>
#include <iostream>
#include <string>
// If possible, rearranges 'in' so that there are no two consecutive
// instances of the same character.
std::string rearrange(std::string in) {
// Sort the input. The function is call-by-value,
// so the argument itself isn't changed.
std::string out;
size_t len = in.size();
if (in.size()) {
out.reserve(len);
std::sort(in.begin(), in.end());
size_t mid = len / 2;
size_t tail = len - mid;
char prev = in[mid];
// For odd-length strings, start with the middle character.
if (len & 1) out.push_back(prev);
for (size_t head = 0; head < mid; ++head, ++tail)
// See explanatory text
if (in[tail] != prev) {
out.push_back(in[tail]);
out.push_back(prev = in[head]);
}
else {
out.push_back(in[head]);
out.push_back(prev = in[tail]);
}
}
}
return out;
}
you can do that by using a priority queue.
Please find the below explanation.
https://iq.opengenus.org/rearrange-string-no-same-adjacent-characters/
Here is a probabilistic approach. The algorithm is:
10) Select a random char from the input string.
20) Try to insert the selected char in a random position in the output string.
30) If it can't be inserted because of proximity with the same char, go to 10.
40) Remove the selected char from the input string and go to 10.
50) Continue until there are no more chars in the input string, or the failed attempts are too many.
public static string ShuffleNoSameAdjacent(string input, Random random = null)
{
if (input == null) return null;
if (random == null) random = new Random();
string output = "";
int maxAttempts = input.Length * input.Length * 2;
int attempts = 0;
while (input.Length > 0)
{
while (attempts < maxAttempts)
{
int inputPos = random.Next(0, input.Length);
var outputPos = random.Next(0, output.Length + 1);
var c = input[inputPos];
if (outputPos > 0 && output[outputPos - 1] == c)
{
attempts++; continue;
}
if (outputPos < output.Length && output[outputPos] == c)
{
attempts++; continue;
}
input = input.Remove(inputPos, 1);
output = output.Insert(outputPos, c.ToString());
break;
}
if (attempts >= maxAttempts) throw new InvalidOperationException(
$"Shuffle failed to complete after {attempts} attempts.");
}
return output;
}
Not suitable for strings longer than 1,000 chars!
Update: And here is a more complicated deterministic approach. The algorithm is:
Group the elements and sort the groups by length.
Create three empty piles of elements.
Insert each group to a separate pile, inserting always the largest group to the smallest pile, so that the piles differ in length as little as possible.
Check that there is no pile with more than half the total elements, in which case satisfying the condition of not having same adjacent elements is impossible.
Shuffle the piles.
Start yielding elements from the piles, selecting a different pile each time.
When the piles that are eligible for selection are more than one, select randomly, weighting by the size of each pile. Piles containing near half of the remaining elements should be much preferred. For example if the remaining elements are 100 and the two eligible piles have 49 and 40 elements respectively, then the first pile should be 10 times more preferable than the second (because 50 - 49 = 1 and 50 - 40 = 10).
public static IEnumerable<T> ShuffleNoSameAdjacent<T>(IEnumerable<T> source,
Random random = null, IEqualityComparer<T> comparer = null)
{
if (source == null) yield break;
if (random == null) random = new Random();
if (comparer == null) comparer = EqualityComparer<T>.Default;
var grouped = source
.GroupBy(i => i, comparer)
.OrderByDescending(g => g.Count());
var piles = Enumerable.Range(0, 3).Select(i => new Pile<T>()).ToArray();
foreach (var group in grouped)
{
GetSmallestPile().AddRange(group);
}
int totalCount = piles.Select(e => e.Count).Sum();
if (piles.Any(pile => pile.Count > (totalCount + 1) / 2))
{
throw new InvalidOperationException("Shuffle is impossible.");
}
piles.ForEach(pile => Shuffle(pile));
Pile<T> previouslySelectedPile = null;
while (totalCount > 0)
{
var selectedPile = GetRandomPile_WeightedByLength();
yield return selectedPile[selectedPile.Count - 1];
selectedPile.RemoveAt(selectedPile.Count - 1);
totalCount--;
previouslySelectedPile = selectedPile;
}
List<T> GetSmallestPile()
{
List<T> smallestPile = null;
int smallestCount = Int32.MaxValue;
foreach (var pile in piles)
{
if (pile.Count < smallestCount)
{
smallestPile = pile;
smallestCount = pile.Count;
}
}
return smallestPile;
}
void Shuffle(List<T> pile)
{
for (int i = 0; i < pile.Count; i++)
{
int j = random.Next(i, pile.Count);
if (i == j) continue;
var temp = pile[i];
pile[i] = pile[j];
pile[j] = temp;
}
}
Pile<T> GetRandomPile_WeightedByLength()
{
var eligiblePiles = piles
.Where(pile => pile.Count > 0 && pile != previouslySelectedPile)
.ToArray();
Debug.Assert(eligiblePiles.Length > 0, "No eligible pile.");
eligiblePiles.ForEach(pile =>
{
pile.Proximity = ((totalCount + 1) / 2) - pile.Count;
pile.Score = 1;
});
Debug.Assert(eligiblePiles.All(pile => pile.Proximity >= 0),
"A pile has negative proximity.");
foreach (var pile in eligiblePiles)
{
foreach (var otherPile in eligiblePiles)
{
if (otherPile == pile) continue;
pile.Score *= otherPile.Proximity;
}
}
var sumScore = eligiblePiles.Select(p => p.Score).Sum();
while (sumScore > Int32.MaxValue)
{
eligiblePiles.ForEach(pile => pile.Score /= 100);
sumScore = eligiblePiles.Select(p => p.Score).Sum();
}
if (sumScore == 0)
{
return eligiblePiles[random.Next(0, eligiblePiles.Length)];
}
var randomScore = random.Next(0, (int)sumScore);
int accumulatedScore = 0;
foreach (var pile in eligiblePiles)
{
accumulatedScore += (int)pile.Score;
if (randomScore < accumulatedScore) return pile;
}
Debug.Fail("Could not select a pile randomly by weight.");
return null;
}
}
private class Pile<T> : List<T>
{
public int Proximity { get; set; }
public long Score { get; set; }
}
This implementation can suffle millions of elements. I am not completely convinced that the quality of the suffling is as perfect as the previous probabilistic implementation, but should be close.
func shuffle(str:String)-> String{
var shuffleArray = [Character](str)
//Sorting
shuffleArray.sort()
var shuffle1 = [Character]()
var shuffle2 = [Character]()
var adjacentStr = ""
//Split
for i in 0..<shuffleArray.count{
if i > shuffleArray.count/2 {
shuffle2.append(shuffleArray[i])
}else{
shuffle1.append(shuffleArray[i])
}
}
let count = shuffle1.count > shuffle2.count ? shuffle1.count:shuffle2.count
//Merge with adjacent element
for i in 0..<count {
if i < shuffle1.count{
adjacentStr.append(shuffle1[i])
}
if i < shuffle2.count{
adjacentStr.append(shuffle2[i])
}
}
return adjacentStr
}
let s = shuffle(str: "AABC")
print(s)
I know the usual approach for "variable number of for loops" is said to use a recursive method. But I wonder if I could solve that without recursion and instead with using Stack, since you can bypass recursion with the use of a stack.
My example:
I have a variable number of collections and I need to combine every item of every collection with every other item of the other collections.
// example for collections A, B and C:
A (4 items) + B (8 items) + C (10 items)
4 * 8 * 10 = 320 combinations
I need to run through all those 320 combinations. Yet at compile time I don't know if B or C or D exist. How would a solution with no recursive method but with the use of an instance of Stack look like?
Edit:
I realized Stack is not necessary here at all, while you can avoid recursion with a simple int array and a few while loops. Thanks for help and info.
Not with a stack but without recursion.
void Main()
{
var l = new List<List<int>>()
{
new List<int>(){ 1,2,3 },
new List<int>(){ 4,5,6 },
new List<int>(){ 7,8,9 }
};
var result = CartesianProduct(l);
}
static IEnumerable<IEnumerable<T>> CartesianProduct<T>(IEnumerable<IEnumerable<T>> sequences)
{
IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>()};
return sequences.Aggregate(
emptyProduct,
(accumulator, sequence) =>
from accseq in accumulator
from item in sequence
select accseq.Concat(new[] {item})
);
}
Function taken form Computing a Cartesian Product with LINQ
Here is an example of how to do this. Algorithm is taken from this question - https://stackoverflow.com/a/2419399/5311735 and converted to C#. Note that it can be made more efficient, but I converted inefficient version to C# because it's better illustrates the concept (you can see more efficient version in the linked question):
static IEnumerable<T[]> CartesianProduct<T>(IList<IList<T>> collections) {
// this contains the indexes of elements from each collection to combine next
var indexes = new int[collections.Count];
bool done = false;
while (!done) {
// initialize array for next combination
var nextProduct = new T[collections.Count];
// fill it
for (int i = 0; i < collections.Count; i++) {
var collection = collections[i];
nextProduct[i] = collection[indexes[i]];
}
yield return nextProduct;
// now we need to calculate indexes for the next combination
// for that, increase last index by one, until it becomes equal to the length of last collection
// then increase second last index by one until it becomes equal to the length of second last collection
// and so on - basically the same how you would do with regular numbers - 09 + 1 = 10, 099 + 1 = 100 and so on.
var j = collections.Count - 1;
while (true) {
indexes[j]++;
if (indexes[j] < collections[j].Count) {
break;
}
indexes[j] = 0;
j--;
if (j < 0) {
done = true;
break;
}
}
}
}
I want to easily pre-populate a single dimensional string array which I am calling "letters" with the values:
AAAAAA
AAAAAB
AAAAAC
AAAAAD
..
..
ZZZZZX
ZZZZZY
ZZZZZZ
Thats 165 million combinations in order.
The idea being I need to then be able to ask for any particular combination of 6 characters such as BBCHHJ and use Array.Index to return the element of the array it is in.
I have the second bit fine:
String searchFor;
Console.Write("Enter a string value to search for: ");
searchFor = Console.ReadLine();
int indexValue = Array.IndexOf(letters, searchFor);
Console.WriteLine("The value you are after is in element index: " + indexValue);
Console.ReadLine();
But I have no idea how to easily initialise the letters array with all those combinations, in order!
A variation on Jakub's answer which should be a bit more efficient:
int result = s
.Select(c => c - 'A') // map 'A'-'Z' to 0-25
.Aggregate(0, (total, next) => total * 26 + next); // calculate the base 26 value
This has the advantage of avoiding the Reverse and the separate Sum, and the powers of 26 don't have to be calculated from scratch in each iteration.
Storing 308 million elements in array and searching them is not the best solution, rather calculate the index at runtime. I have created a code sample:
string input = "ZZZZZZ";
//default values
string alphabets_s = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
char[] alphabets = alphabets_s.ToCharArray();
int result = 1; //starting with "one" because zero will make everything zero
//calculating index
for (int i = 0; i < input.Length; i++)
{
//get character index and add "1" to avoid multiplication with "0"
int index = Array.IndexOf(alphabets, input[i]) + 1;
//multiply it with the current result
result *= index;
}
//subtract 1 from final result, because we started it with 1
result--;
PS: I did just basic testing, please inform me if you find anything wrong in it.
As I wrote in a comment, what you're trying to achieve is basically conversion from base 26 number.
The first step is to convert the string to a list of digits. Then just multiply by powers of 26 and add together:
var s = "AAAABB";
var result = s
.Select(c => c - 'A') //map characters to numbers: A -> 0, B -> 1 etc
.Reverse() //reverse the sequence to have the least significant digit first
.Select((d, i) => d * Math.Pow(26, i))
.Sum();
For the following characters a,b,c,d I want to find the following combinations.
The sequence is always sorted. I wonder how I should approach in finding the combinations?
a
b
c
d
ab
ac
ad
bc
bd
cd
abc
abd
acd
bcd
abcd
What you want is every single Combination. Normally when getting combinations you get all combinations of a particular size, n. We'll start out by creating that method to get the combinations of size n from a sequence:
public static IEnumerable<IEnumerable<T>> Combinations<T>(
this IEnumerable<T> source, int n)
{
if (n == 0)
yield return Enumerable.Empty<T>();
int count = 1;
foreach (T item in source)
{
foreach (var innerSequence in source.Skip(count).Combinations(n - 1))
{
yield return new T[] { item }.Concat(innerSequence);
}
count++;
}
}
Once you have that it's a simple matter of getting the combinations of n for all n from 1 to the size of the sequence:
public static IEnumerable<IEnumerable<T>> AllCombinations<T>(this IList<T> source)
{
IEnumerable<IEnumerable<T>> output = Enumerable.Empty<IEnumerable<T>>();
for (int i = 0; i < source.Count; i++)
{
output = output.Concat(source.Combinations(i));
}
return output;
}
Some sample code that uses it:
var list = new List<string> { "a", "b", "c", "d" };
foreach (var sequence in list.AllCombinations())
{
Console.WriteLine(string.Join(" ", sequence));
}
It's worth noting that this operation is extraordinarily expensive for all but the tiniest input sequences. It's not exactly the most efficient around, but even if you do eek out every last bit of performance you won't be able to compute the combinations of sequences of more than 15-20, depending on how long you're willing to wait and how good your computer is.
You can use the Combinatorics library to calculate them for you (documentation), but as Servy said, length of the data is a major factor in how long it will take.
I have written a class to handle common functions for working with the binomial coefficient, which is the type of problem that your problem falls under. I have not looked at the Cominatorics library suggested by #Bobson, but I believe my class is probably much faster and more efficient. It performs the following tasks:
Outputs all the K-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters.
Converts the K-indexes to the proper index of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration. It does this by using a mathematical property inherent in Pascal's Triangle. My paper talks about this. I believe I am the first to discover and publish this technique, but I could be wrong.
Converts the index in a sorted binomial coefficient table to the corresponding K-indexes. I believe it might be faster than the link you have found.
Uses Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers.
The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to perform the 4 above methods. Accessor methods are provided to access the table.
There is an associated test class which shows how to use the class and its methods. It has been extensively tested with 2 cases and there are no known bugs.
To read about this class and download the code, see Tablizing The Binomial Coeffieicent.
The solution to your problem involves generating the K-indexes for each N choose K case. So in your example above where there are 4 possibilities for N (A, B, C, D) the code (in C#) would look something like this:
int TotalNumberOfValuesInSet = 4;
int N = TotalNumberOfValuesInSet;
// Loop thru all the possible groups of combinations.
for (int K = N - 1; K < N; K++)
{
// Create the bin coeff object required to get all
// the combos for this N choose K combination.
BinCoeff<int> BC = new BinCoeff<int>(N, K, false);
int NumCombos = BinCoeff<int>.GetBinCoeff(N, K);
int[] KIndexes = new int[K];
// Loop thru all the combinations for this N choose K case.
for (int Combo = 0; Combo < NumCombos; Combo++)
{
// Get the k-indexes for this combination, which in this case
// are the indexes to each letter in the set starting with index zero.
BC.GetKIndexes(Loop, KIndexes);
// Do whatever processing that needs to be done with the indicies in KIndexes.
...
}
// Handle the final combination which in this case is ABCD since since K < N.
...
}
They're not truly subsets because there's nothing to stop your input sequence from containing duplicates, but the following extension method should work in the general case:
public static IEnumerable<IEnumerable<T>> Subsets<T>(this IEnumerable<T> source)
{
List<T[]> yielded = new List<T[]> { new T[0] };
foreach(T t in source)
{
List<T[]> newlyYielded = new List<T[]>();
foreach(var y in yielded)
{
var newSubset = y.Concat(new[] {t}).ToArray();
newlyYielded.Add(newSubset);
yield return newSubset;
}
yielded.AddRange(newlyYielded);
}
}
Basically starting with an empty sequence, it adds the empty sequence with the first item appended. Then for each of those two sequences, it adds that sequence with the next item appended. Then for each of those four sequences...
This has to keep a copy of each sequence generated, so will use a lot of memory.
To get strings out of string, you can call this as
"abcd".Subsets().Select(chars => new string(chars.ToArray()))
If you're not going to have many characters, you could take advantage of the fact that you can calculate the nth subset directly:
public static int SubsetCount(this string s)
{
return 2 << s.Length;
}
public static string NthSubset(this string s, int n)
{
var b = New StringBuilder();
int i = 0;
while (n > 0)
{
if ((n&1)==1) b.Append(s[i]);
i++;
n >>= 1;
}
return b.ToString();
}
The code of servy above is quite elegant, but it doesn't produce those combinations that have the same length as the source.
for (int i = 0; i < source.Count; i++) should be
for (int i = 0; i <= source.Count; i++).
Below is the vb.net variant, which can't use yield.
<Extension()>
Public Function Combinations(Of T)(source As IEnumerable(Of T), n As Integer) As IEnumerable(Of IEnumerable(Of T))
Dim lstResults As New List(Of IEnumerable(Of T))
If n = 0 Then
lstResults.Add(Enumerable.Empty(Of T))
Else
Dim count As Integer = 1
For Each item As T In source
For Each innerSequence In source.Skip(count).Combinations(n - 1)
lstResults.Add(New T() {item}.Concat(innerSequence))
Next
count += 1
Next
End If
Return lstResults
End Function
<Extension()>
Public Function AllCombinations(Of T)(source As IList(Of T)) As IEnumerable(Of IEnumerable(Of T))
Dim output As IEnumerable(Of IEnumerable(Of T)) = Enumerable.Empty(Of IEnumerable(Of T))()
For i As Integer = 0 To source.Count
output = output.Concat(source.Combinations(i))
Next
Return output
End Function