Speed up processing 32 bit numbers in combinations (k from n)

Speed up processing 32 bit numbers in combinations (k from n) - c#

I have a list of 128 32 bit numbers, and I want to know, is there any combination of 12 numbers, so that all numbers XORed give the 32 bit number with all bits set to 1.
So I have started with naive approach and took combinations generator like that:
private static IEnumerable<int[]> Combinations(int k, int n)
{
var state = new int[k];
var stack = new Stack<int>();
stack.Push(0);
while (stack.Count > 0)
{
var index = stack.Count - 1;
var value = stack.Pop();
while (value < n)
{
state[index++] = value++;
if (value < n)
{
stack.Push(value);
}
if (index == k)
{
yield return state;
break;
}
}
}
}
and used it like that (data32 is an array of given 32bit numbers)
foreach (var probe in Combinations(12, 128))
{
int p = 0;
foreach (var index in probe)
{
p = p ^ data32[index];
}
if (p == -1)
{
//print out found combination
}
}
Of course it takes forever to check all 23726045489546400 combinations...
So my question(s) are - am I missing something in options how to speedup the check process?
Even if I do the calculation of combinations in partitions (e.g. I could start like 8 threads each will check combination started with numbers 0..8), or speed up the XORing by storing the perviously calculated combination - it is still slow.
P.S. I'd like it to run in reasonable time - minutes, hours not years.
Adding a list of numbers as was requested in one of the comments:
1571089837
2107702069
466053875
226802789
506212087
484103496
1826565655
944897655
1370004928
748118360
1000006005
952591039
2072497930
2115635395
966264796
1229014633
827262231
1276114545
1480412665
2041893083
512565106
1737382276
1045554806
172937528
1746275907
1376570954
1122801782
2013209036
1650561071
1595622894
425898265
770953281
422056706
477352958
1295095933
1783223223
842809023
1939751129
1444043041
1560819338
1810926532
353960897
1128003064
1933682525
1979092040
1987208467
1523445101
174223141
79066913
985640026
798869234
151300097
770795939
1489060367
823126463
1240588773
490645418
832012849
188524191
1034384571
1802169877
150139833
1762370591
1425112310
2121257460
205136626
706737928
265841960
517939268
2070634717
1703052170
1536225470
1511643524
1220003866
714424500
49991283
688093717
1815765740
41049469
529293552
1432086255
1001031015
1792304327
1533146564
399287468
1520421007
153855202
1969342940
742525121
1326187406
1268489176
729430821
1785462100
1180954683
422085275
1578687761
2096405952
1267903266
2105330329
471048135
764314242
459028205
1313062337
1995689086
1786352917
2072560816
282249055
1711434199
1463257872
1497178274
472287065
246628231
1928555152
1908869676
1629894534
885445498
1710706530
1250732374
107768432
524848610
2791827620
1607140095
1820646148
774737399
1808462165
194589252
1051374116
1802033814

I don't know C#, I did something in Python, maybe interesting anyway. Takes about 0.8 seconds to find a solution for your sample set:
solution = {422056706, 2791827620, 506212087, 1571089837, 827262231, 1650561071, 1595622894, 512565106, 205136626, 944897655, 966264796, 477352958}
len(solution) = 12
solution.issubset(nums) = True
hex(xor(solution)) = '0xffffffff'
There are 128C12 combinations, that's 5.5 million times as many as the 232 possible XOR values. So I tried being optimistic and only tried a subset of the possible combinations. I split the 128 numbers into two blocks of 28 and 100 numbers and try combinations with six numbers from each of the two blocks. I put all possible XORs of the first block into a hash set A, then go through all XORs of the second block to find one whose bitwise inversion is in that set. Then I reconstruct the individual numbers.
This way I cover (28C6)2 × (100C6)2 = 4.5e14 combinations, still over 100000 times as many as there are possible XOR values. So probably still a very good chance to find a valid combination.
Code (Try it online!):
from itertools import combinations
from functools import reduce
from operator import xor as xor_
nums = list(map(int, '1571089837 2107702069 466053875 226802789 506212087 484103496 1826565655 944897655 1370004928 748118360 1000006005 952591039 2072497930 2115635395 966264796 1229014633 827262231 1276114545 1480412665 2041893083 512565106 1737382276 1045554806 172937528 1746275907 1376570954 1122801782 2013209036 1650561071 1595622894 425898265 770953281 422056706 477352958 1295095933 1783223223 842809023 1939751129 1444043041 1560819338 1810926532 353960897 1128003064 1933682525 1979092040 1987208467 1523445101 174223141 79066913 985640026 798869234 151300097 770795939 1489060367 823126463 1240588773 490645418 832012849 188524191 1034384571 1802169877 150139833 1762370591 1425112310 2121257460 205136626 706737928 265841960 517939268 2070634717 1703052170 1536225470 1511643524 1220003866 714424500 49991283 688093717 1815765740 41049469 529293552 1432086255 1001031015 1792304327 1533146564 399287468 1520421007 153855202 1969342940 742525121 1326187406 1268489176 729430821 1785462100 1180954683 422085275 1578687761 2096405952 1267903266 2105330329 471048135 764314242 459028205 1313062337 1995689086 1786352917 2072560816 282249055 1711434199 1463257872 1497178274 472287065 246628231 1928555152 1908869676 1629894534 885445498 1710706530 1250732374 107768432 524848610 2791827620 1607140095 1820646148 774737399 1808462165 194589252 1051374116 1802033814'.split()))
def xor(vals):
return reduce(xor_, vals)
A = {xor(a)^0xffffffff: a
for a in combinations(nums[:28], 6)}
for b in combinations(nums[28:], 6):
if a := A.get(xor(b)):
break
solution = {*a, *b}
print(f'{solution = }')
print(f'{len(solution) = }')
print(f'{solution.issubset(nums) = }')
print(f'{hex(xor(solution)) = }')

Arrange your numbers into buckets based on the position of the first 1 bit.
To set the first bit to 1, you will have to use an odd number of the items in the corresponding bucket....
As you recurse, try to maintain an invariant that the number of leading 1 bits is increasing and then select the bucket that will change the next 0 to a 1, this will greatly reduce the number of combinations that you have to try.

I have found a possible solution, which could work for my particular task.
As main issue to straitforward approach I see a number of 2E16 combinations.
But, if I want to check if combination of 12 elements equal to 0xFFFFFFFF, I could check if 2 different combinations of 6 elements with opposit values exists.
That will reduce number of combinations to "just" 5E9, which is achievable.
On first attempt I think to store all combinations and then find opposites in the big list. But, in .NET I could not find quick way of storing more then Int32.MaxValue elements.
Taking in account idea with bits from comments and answer, I decided to store at first only xor sums with leftmost bit set to 1, and then by definition I need to check only sums with leftmost bit set to 0 => reducing storage by 2.
In the end it appears that many collisions could appear, so there are many combinations with the same xor sum.
Current version which could find such combinations, need to be compiled in x64 mode and use (any impovements welcomed):
static uint print32(int[] comb, uint[] data)
{
uint p = 0;
for (int i = 0; i < comb.Length; i++)
{
Console.Write("{0} ", comb[i]);
p = p ^ data[comb[i]];
}
Console.WriteLine(" #[{0:X}]", p);
return p;
}
static uint[] data32;
static void Main(string[] args)
{
int n = 128;
int k = 6;
uint p = 0;
uint inv = 0;
long t = 0;
//load n numbers from a file
init(n);
var lookup1x = new Dictionary<uint, List<byte>>();
var lookup0x = new Dictionary<uint, List<byte>>();
Stopwatch watch = new Stopwatch();
watch.Start();
//do not use IEnumerable generator, use function directly to reuse xor value
var hash = new uint[k];
var comb = new int[k];
var stack = new Stack<int>();
stack.Push(0);
while (stack.Count > 0)
{
var index = stack.Count - 1;
var value = stack.Pop();
if (index == 0)
{
p = 0;
Console.WriteLine("Start {0} sequence, combinations found: {1}",value,t);
}
else
{
//restore previous xor value
p = hash[index - 1];
}
while (value < n)
{
//xor and store
p = p ^ data32[value];
hash[index] = p;
//remember current state (combination)
comb[index++] = value++;
if (value < n)
{
stack.Push(value);
}
//combination filled to end
if (index == k)
{
//if xor have MSB set, put it to lookup table 1x
if ((p & 0x8000000) == 0x8000000)
{
lookup1x[p] = comb.Select(i => (byte)i).ToList();
inv = p ^ 0xFFFFFFFF;
if (lookup0x.ContainsKey(inv))
{
var full = lookup0x[inv].Union(lookup1x[p]).OrderBy(x=>x).ToArray();
if (full.Length == 12)
{
print32(full, data32);
}
}
}
else
{
//otherwise put it to lookup table 2, but skip all combinations which are started with 0
if (comb[0] != 0)
{
lookup0x[p] = comb.Select(i => (byte)i).ToList();
inv = p ^ 0xFFFFFFFF;
if (lookup1x.ContainsKey(inv))
{
var full = lookup0x[p].Union(lookup1x[inv]).OrderBy(x=>x).ToArray();
if (full.Length == 12)
{
print32(full, data32);
}
}
}
}
t++;
break;
}
}
}
Console.WriteLine("Check was done in {0} ms ", watch.ElapsedMilliseconds);
//end
}

Related

Nearest value from user input in an array C#

so in my application , I read some files into it and ask the user for a number , in these files there a lot of numbers and I am trying to find the nearest value when the number they enter is not in the file. So far I have as following
static int nearest(int close_num, int[] a)
{
foreach (int bob in a)
{
if ((close_num -= bob) <= 0)
return bob;
}
return -1;
}
Console.WriteLine("Enter a number to find out if is in the selected Net File: ");
int i3 = Convert.ToInt32(Console.ReadLine());
bool checker = false;
//Single nearest = 0;
//linear search#1
for (int i = 0; i < a.Length; i++)//looping through array
{
if(a[i] == i3)//checking to see the value is found in the array
{
Console.WriteLine("Value found and the position of it in the descending value of the selected Net File is: " + a[i]);
checker = true;
}
else
{
int found = nearest(i3,a);
Console.WriteLine("Cannot find this number in the Net File however here the closest number to that: " + found );
//Console.WriteLine("Cannot find this number in the Net File however here the closest number to that : " + nearest);
}
}
When a value that is in the file is entered the output is fine , but when it comes to the nearest value I cannot figure a way. I can't use this such as BinarySearchArray for this. a = the array whilst i3 is the value the user has entered. Would a binary search algorithm just be simpler for this?
Any help would be appreciated.

You need to make a pass over all the elements of the array, comparing each one in turn to find the smallest difference. At the same time, keep a note of the current nearest value.
There are many ways to do this; here's a fairly simple one:
static int nearest(int close_num, int[] a)
{
int result = -1;
long smallestDelta = long.MaxValue;
foreach (int bob in a)
{
long delta = (bob > close_num) ? (bob - close_num) : (close_num - bob);
if (delta < smallestDelta)
{
smallestDelta = delta;
result = bob;
}
}
return result;
}
Note that delta is calculated so that it is the absolute value of the difference.

Well, first we should define, what is nearest. Assuming that,
int nearest for given int number is the item of int[] a such that Math.Abs(nearest - number) is the smallest possible value
we can put it as
static int nearest(int number, int[] a)
{
long diff = -1;
int result = 0;
foreach (int item in a)
{
// actual = Math.Abs((long)item - number);
long actual = (long)item - number;
if (actual < 0)
actual = -actual;
// if item is the very first value or better than result
if (diff < 0 || actual < diff) {
result = item;
diff = actual;
}
}
return result;
}
The only tricky part is long for diff: it may appear that item - number exceeds int range (and will either have IntegerOverflow exceprion thrown or *invalid answer), e.g.
int[] a = new int[] {int.MaxValue, int.MaxValue - 1};
Console.Write(nearest(int.MinValue, a));
Note, that expected result is 2147483646, not 2147483647

what about LINQ ?
var nearestNumber = a.OrderBy(x => Math.Abs(x - i3)).First();

Just iterate through massive and find the minimal delta between close_num and array members
static int nearest(int close_num, int[] a)
{
// initialize as big number, 1000 just an example
int min_delta=1000;
int result=-1;
foreach (int bob in a)
{
if (Math.Abs(bob-close_num) <= min_delta)
{
min_delta = bob-close_num;
result = bob;
}
}
return result;
}

How to remove duplicate lines from a large text file efficiently?

I want to edit a text as each line exists once in it. Each lines contains constantly 10 characters. I am generally working on 5-6 million of lines. So the code i am using currently is consuming too much RAM.
My code:
File.WriteAllLines(targetpath, File.ReadAllLines(sourcepath).Distinct())
So how can I make it less RAM consumer and less time-consumer at the same time?

Taking into account how much memory a string will take in C#, and assuming 10 characters length for 6 million records we get:
size in bytes ~= 20 + (length / 2 ) * 4;
total size in bytes ~= (20 + ( 10 / 2 ) * 4 )* 6000000 = 240 000 000
total size in Mb ~= 230
Now, 230 MB of space is not really a problem, even on x86 (32 bit system), so you can load all that data in memory.
For this, I would use a HashSet class which is obviously, a hash set that will let you easily eliminate the duplicates, by using lookup before adding an element.
In terms of big-O notation for time complexity, the average performance of a lookup in a hash set is O(1), which is the best you can get. In total, you would use lookup N times, totalling to N * O(1) = O(N)
In terms of big-O notation for space complexity, you would have O(N) space used, meaning that you use up memory proportional to number of elements, which is also the best you can get.
I'm not sure it is even possible to use up less space if you implement the algorithm in C# and not rely on any external components (that would also use at least O(N))
That being said, you can optimize for some scenarios by reading your file sequentially, line by line, see here.
This would give a better result if you have lots of duplicates, but worst case scenario when all the lines are distinct would consume the same amount of memory.
On a final note, if you look how Distinct method is implemented, you will see it also uses an implementation of hash table, although it's not the same class, but the performance is still roughly the same, check out this question for more details.

As ironstone13 corrected me, HashSet is OK, but does store the data.
Then this works fine too:
string[] arr = File.ReadAllLines("file.txt");
HashSet<string> hashes = new HashSet<string>();
for (int i = 0; i < arr.Length; i++)
{
if (!hashes.Add(arr[i])) arr[i] = null;
}
File.WriteAllLines("file2.txt", arr.Where(x => x != null));
This implementation was motivated by memory performance and hash conflicts.
The main idea was to keep just hashes, of course it would have to get back to file to get the line it sees as hash conflict/duplicit, to detect which one it is. (that part is not implemented).
class Program
{
static string[] arr;
static Dictionary<int, int>[] hashes = new Dictionary<int, int>[1]
{ new Dictionary<int, int>() }
;
static int[] file_indexes = {-1};
static void AddHash(int hash, int index)
{
for (int h = 0; h < hashes.Length; h++)
{
Dictionary<int, int> dict = hashes[h];
if (!dict.ContainsKey(hash))
{
dict[hash] = index;
return;
}
}
hashes = hashes.Union(new[] {new Dictionary<int, int>() {{hash, index}}}).ToArray();
file_indexes = Enumerable.Range(0, hashes.Length).Select(x => -1).ToArray();
}
static int UpdateFileIndexes(int hash)
{
int updates = 0;
for (int h = 0; h < hashes.Length; h++)
{
int index;
if (hashes[h].TryGetValue(hash, out index))
{
file_indexes[h] = index;
updates++;
}
else
{
file_indexes[h] = -1;
}
}
return updates;
}
static bool IsDuplicate(int index)
{
string str1 = arr[index];
for (int h = 0; h < hashes.Length; h++)
{
int i = file_indexes[h];
if (i == -1 || index == i) continue;
string str0 = arr[i];
if (str0 == null) continue;
if (string.CompareOrdinal(str0, str1) == 0) return true;
}
return false;
}
static void Main(string[] args)
{
arr = File.ReadAllLines("file.txt");
for (int i = 0; i < arr.Length; i++)
{
int hash = arr[i].GetHashCode();
if (UpdateFileIndexes(hash) == 0) AddHash(hash, i);
else if (IsDuplicate(i)) arr[i] = null;
else AddHash(hash, i);
}
File.WriteAllLines("file2.txt", arr.Where(x => x != null));
Console.WriteLine("DONE");
Console.ReadKey();
}
}

Before you write your data, if your data is in a list or dictionary, you could run LINQ query and use group by to group all like keys. Then for each write to the output file.
Your question is a little vague as well. Are you creating a next text file every time and do you have to store the data in text? There are better formats to use such as XML and json

How to list all possible combinations of a N value set, where each value is from a fixed subset of values using C#?

if there are N values, each value can be drawn from a sub-set of values, say (1,2,3) for example, then how to derive the possible combinations? note that each one will contain N values, not the subsets.
for example, let's say N = 4, the possible outputs could be:
1,1,1,1
1,2,1,3
2,1,1,3
...

If you have M different values and want to generate N-element combinations, consider these combinations as N-digit numbers in M-ary numeral system. There are M^N such combinations. Pseudocode:
for i = 0 to Power(M, N) - 1 do
represent i in M-ary system:
tmp = i
for k = 0 to N - 1 do
digit[k] = tmp % M //integer modulo
tmp = tmp / M //integer division
Example: if N = 3, M = 3, there are 27 combinations , and at 11-th step we have
11(dec) = 102 (trinary), combination is (1,0,2) if set is zero-based, or (2,1,3) if it is 1-based

This is another solution, using a static utility class:
static class SequencesCalculation
{
public static List<int[]> Calculate(int[] availableValues, int digitsCount)
{
var combIndexes = CalculateRecursive(new List<int[]>(), availableValues.Length, new int[digitsCount], digitsCount - 1);
var result = combIndexes.Select(x => x.Select(i => availableValues[i]).ToArray()).ToList();
return result;
}
static List<int[]> CalculateRecursive(List<int[]> doneCombinations, int valuesCount, int[] array, int i)
{
doneCombinations.Add((int[])array.Clone());
//base case
if (array.All(x => x == valuesCount - 1))
return doneCombinations;
NextCombination(array, valuesCount, i);
return CalculateRecursive(doneCombinations, valuesCount, array, i);
}
static void NextCombination(int[] array, int valuesCount, int i)
{
array[i] = (array[i] + 1) % valuesCount;
if (i == 0)
return;
if (array[i] == 0)
NextCombination(array, valuesCount, i - 1);
}
}
I think the #MBo solution is far more elegant than mine. I don't know which is faster. His solution has lot of arithmetic divisions, but mine has much stack allocation forward and backward because of all the recursive method calls.
Anyway, I think that his solution should be checked as the correct answer.

Combination Algorithm

Length = input Long(can be 2550, 2880, 2568, etc)
List<long> = {618, 350, 308, 300, 250, 232, 200, 128}
The program takes a long value, for that particular long value we have to find the possible combination from the above list which when added give me a input result(same value can be used twice). There can be a difference of +/- 30.
Largest numbers have to be used most.
Ex:Length = 868
For this combinations can be
Combination 1 = 618 + 250
Combination 2 = 308 + 232 + 200 +128
Correct Combination would be Combination 1
But there should also be different combinations.
public static void Main(string[] args)
{
//subtotal list
List<int> totals = new List<int>(new int[] { 618, 350, 308, 300, 250, 232, 200, 128 });
// get matches
List<int[]> results = KnapSack.MatchTotal(2682, totals);
// print results
foreach (var result in results)
{
Console.WriteLine(string.Join(",", result));
}
Console.WriteLine("Done.");
}
internal static List<int[]> MatchTotal(int theTotal, List<int> subTotals)
{
List<int[]> results = new List<int[]>();
while (subTotals.Contains(theTotal))
{
results.Add(new int[1] { theTotal });
subTotals.Remove(theTotal);
}
if (subTotals.Count == 0)
return results;
subTotals.Sort();
double mostNegativeNumber = subTotals[0];
if (mostNegativeNumber > 0)
mostNegativeNumber = 0;
if (mostNegativeNumber == 0)
subTotals.RemoveAll(d => d > theTotal);
for (int choose = 0; choose <= subTotals.Count; choose++)
{
IEnumerable<IEnumerable<int>> combos = Combination.Combinations(subTotals.AsEnumerable(), choose);
results.AddRange(from combo in combos where combo.Sum() == theTotal select combo.ToArray());
}
return results;
}
public static class Combination
{
public static IEnumerable<IEnumerable<T>> Combinations<T>(this IEnumerable<T> elements, int choose)
{
return choose == 0 ?
new[] { new T[0] } :
elements.SelectMany((element, i) =>
elements.Skip(i + 1).Combinations(choose - 1).Select(combo => (new[] { element }).Concat(combo)));
}
}
I Have used the above code, can it be more simplified, Again here also i get unique values. A value can be used any number of times. But the largest number has to be given the most priority.
I have a validation to check whether the total of the sum is greater than the input value. The logic fails even there..

The algorithm you have shown assumes that the list is sorted in ascending order. If not, then you shall first have to sort the list in O(nlogn) time and then execute the algorithm.
Also, it assumes that you are only considering combinations of pairs and you exit on the first match.
If you want to find all combinations, then instead of "break", just output the combination and increment startIndex or decrement endIndex.
Moreover, you should check for ranges (targetSum - 30 to targetSum + 30) rather than just the exact value because the problem says that a margin of error is allowed.
This is the best solution according to me because its complexity is O(nlogn + n) including the sorting.

V4 - Recursive Method, using Stack structure instead of stack frames on thread
It works (tested in VS), but there could be some bugs remaining.
static int Threshold = 30;
private static Stack<long> RecursiveMethod(long target)
{
Stack<long> Combination = new Stack<long>(establishedValues.Count); //Can grow bigger, as big as (target / min(establishedValues)) values
Stack<int> Index = new Stack<int>(establishedValues.Count); //Can grow bigger
int lowerBound = 0;
int dimensionIndex = lowerBound;
long fail = -1 * Threshold;
while (true)
{
long thisVal = establishedValues[dimensionIndex];
dimensionIndex++;
long afterApplied = target - thisVal;
if (afterApplied < fail)
lowerBound = dimensionIndex;
else
{
target = afterApplied;
Combination.Push(thisVal);
if (target <= Threshold)
return Combination;
Index.Push(dimensionIndex);
dimensionIndex = lowerBound;
}
if (dimensionIndex >= establishedValues.Count)
{
if (Index.Count == 0)
return null; //No possible combinations
dimensionIndex = Index.Pop();
lowerBound = dimensionIndex;
target += Combination.Pop();
}
}
}
Maybe V3 - Suggestion for Ordered solution trying every combination
Although this isn't chosen as the answer for the related question, I believe this is a good approach - https://stackoverflow.com/a/17258033/887092(, otherwise you could try the chosen answer (although the output for that is only 2 items in set being summed, rather than up to n items)) - it will enumerate every option including multiples of the same value. V2 works but would be slightly less efficient than an ordered solution, as the same failing-attempt will likely be attempted multiple times.
V2 - Random Selection - Will be able to reuse the same number twice
I'm a fan of using random for "intelligence", allowing the computer to brute force the solution. It's also easy to distribute - as there is no state dependence between two threads trying at the same time for example.
static int Threshold = 30;
public static List<long> RandomMethod(long Target)
{
List<long> Combinations = new List<long>();
Random rnd = new Random();
//Assuming establishedValues is sorted
int LowerBound = 0;
long runningSum = Target;
while (true)
{
int newLowerBound = FindLowerBound(LowerBound, runningSum);
if (newLowerBound == -1)
{
//No more beneficial values to work with, reset
runningSum = Target;
Combinations.Clear();
LowerBound = 0;
continue;
}
LowerBound = newLowerBound;
int rIndex = rnd.Next(LowerBound, establishedValues.Count);
long val = establishedValues[rIndex];
runningSum -= val;
Combinations.Add(val);
if (Math.Abs(runningSum) <= 30)
return Combinations;
}
}
static int FindLowerBound(int currentLowerBound, long runningSum)
{
//Adjust lower bound, so we're not randomly trying a number that's too high
for (int i = currentLowerBound; i < establishedValues.Count; i++)
{
//Factor in the threshold, because an end aggregate which exceeds by 20 is better than underperforming by 21.
if ((establishedValues[i] - Threshold) < runningSum)
{
return i;
}
}
return -1;
}
V1 - Ordered selection - Will not be able to reuse the same number twice
Add this very handy extension function (uses a binary algorithm to find all combinations):
//Make sure you put this in a static class inside System namespace
public static IEnumerable<List<T>> EachCombination<T>(this List<T> allValues)
{
var collection = new List<List<T>>();
for (int counter = 0; counter < (1 << allValues.Count); ++counter)
{
List<T> combination = new List<T>();
for (int i = 0; i < allValues.Count; ++i)
{
if ((counter & (1 << i)) == 0)
combination.Add(allValues[i]);
}
if (combination.Count == 0)
continue;
yield return combination;
}
}
Use the function
static List<long> establishedValues = new List<long>() {618, 350, 308, 300, 250, 232, 200, 128, 180, 118, 155};
//Return is a list of the values which sum to equal the target. Null if not found.
List<long> FindFirstCombination(long target)
{
foreach (var combination in establishedValues.EachCombination())
{
//if (combination.Sum() == target)
if (Math.Abs(combination.Sum() - target) <= 30) //Plus or minus tolerance for difference
return combination;
}
return null; //Or you could throw an exception
}
Test the solution
var target = 858;
var result = FindFirstCombination(target);
bool success = (result != null && result.Sum() == target);
//TODO: for loop with random selection of numbers from the establishedValues, Sum and test through FindFirstCombination

How can I obtain all the possible combination of a subset?

Consider this List<string>
List<string> data = new List<string>();
data.Add("Text1");
data.Add("Text2");
data.Add("Text3");
data.Add("Text4");
The problem I had was: how can I get every combination of a subset of the list?
Kinda like this:
#Subset Dimension 4
Text1;Text2;Text3;Text4
#Subset Dimension 3
Text1;Text2;Text3;
Text1;Text2;Text4;
Text1;Text3;Text4;
Text2;Text3;Text4;
#Subset Dimension 2
Text1;Text2;
Text1;Text3;
Text1;Text4;
Text2;Text3;
Text2;Text4;
#Subset Dimension 1
Text1;
Text2;
Text3;
Text4;
I came up with a decent solution which a think is worth to share here.

Similar logic as Abaco's answer, different implementation....
foreach (var ss in data.SubSets_LB())
{
Console.WriteLine(String.Join("; ",ss));
}
public static class SO_EXTENSIONS
{
public static IEnumerable<IEnumerable<T>> SubSets_LB<T>(
this IEnumerable<T> enumerable)
{
List<T> list = enumerable.ToList();
ulong upper = (ulong)1 << list.Count;
for (ulong i = 0; i < upper; i++)
{
List<T> l = new List<T>(list.Count);
for (int j = 0; j < sizeof(ulong) * 8; j++)
{
if (((ulong)1 << j) >= upper) break;
if (((i >> j) & 1) == 1)
{
l.Add(list[j]);
}
}
yield return l;
}
}
}

I think, the answers in this question need some performance tests. I'll give it a go. It is community wiki, feel free to update it.
void PerfTest()
{
var list = Enumerable.Range(0, 21).ToList();
var t1 = GetDurationInMs(list.SubSets_LB);
var t2 = GetDurationInMs(list.SubSets_Jodrell2);
var t3 = GetDurationInMs(() => list.CalcCombinations(20));
Console.WriteLine("{0}\n{1}\n{2}", t1, t2, t3);
}
long GetDurationInMs(Func<IEnumerable<IEnumerable<int>>> fxn)
{
fxn(); //JIT???
var count = 0;
var sw = Stopwatch.StartNew();
foreach (var ss in fxn())
{
count = ss.Sum();
}
return sw.ElapsedMilliseconds;
}
OUTPUT:
1281
1604 (_Jodrell not _Jodrell2)
6817
Jodrell's Update
I've built in release mode, i.e. optimizations on. When I run via Visual Studio I don't get a consistent bias between 1 or 2, but after repeated runs LB's answer wins, I get answers approaching something like,
1190
1260
more
but if I run the test harness from the command line, not via Visual Studio, I get results more like this
987
879
still more

EDIT
I've accepted the performance gauntlet, what follows is my amalgamation that takes the best of all answers. In my testing, it seems to have the best performance yet.
public static IEnumerable<IEnumerable<T>> SubSets_Jodrell2<T>(
this IEnumerable<T> source)
{
var list = source.ToList();
var limit = (ulong)(1 << list.Count);
for (var i = limit; i > 0; i--)
{
yield return list.SubSet(i);
}
}
private static IEnumerable<T> SubSet<T>(
this IList<T> source, ulong bits)
{
for (var i = 0; i < source.Count; i++)
{
if (((bits >> i) & 1) == 1)
{
yield return source[i];
}
}
}
Same idea again, almost the same as L.B's answer but my own interpretation.
I avoid the use of an internal List and Math.Pow.
public static IEnumerable<IEnumerable<T>> SubSets_Jodrell(
this IEnumerable<T> source)
{
var count = source.Count();
if (count > 64)
{
throw new OverflowException("Not Supported ...");
}
var limit = (ulong)(1 << count) - 2;
for (var i = limit; i > 0; i--)
{
yield return source.SubSet(i);
}
}
private static IEnumerable<T> SubSet<T>(
this IEnumerable<T> source,
ulong bits)
{
var check = (ulong)1;
foreach (var t in source)
{
if ((bits & check) > 0)
{
yield return t;
}
check <<= 1;
}
}
You'll note that these methods don't work with more than 64 elements in the intial set but it starts to take a while then anyhow.

I developed a simple ExtensionMethod for lists:
/// <summary>
/// Obtain all the combinations of the elements contained in a list
/// </summary>
/// <param name="subsetDimension">Subset Dimension</param>
/// <returns>IEnumerable containing all the differents subsets</returns>
public static IEnumerable<List<T>> CalcCombinations<T>(this List<T> list, int subsetDimension)
{
//First of all we will create a binary matrix. The dimension of a single row
//must be the dimension of list
//on which we are working (we need a 0 or a 1 for every single element) so row
//dimension is to obtain a row-length = list.count we have to
//populate the matrix with the first 2^list.Count binary numbers
int rowDimension = Convert.ToInt32(Math.Pow(2, list.Count));
//Now we start counting! We will fill our matrix with every number from 1
//(0 is meaningless) to rowDimension
//we are creating binary mask, hence the name
List<int[]> combinationMasks = new List<int[]>();
for (int i = 1; i < rowDimension; i++)
{
//I'll grab the binary rapresentation of the number
string binaryString = Convert.ToString(i, 2);
//I'll initialize an array of the apropriate dimension
int[] mask = new int[list.Count];
//Now, we have to convert our string in a array of 0 and 1, so first we
//obtain an array of int then we have to copy it inside our mask
//(which have the appropriate dimension), the Reverse()
//is used because of the behaviour of CopyTo()
binaryString.Select(x => x == '0' ? 0 : 1).Reverse().ToArray().CopyTo(mask, 0);
//Why should we keep masks of a dimension which isn't the one of the subset?
// We have to filter it then!
if (mask.Sum() == subsetDimension) combinationMasks.Add(mask);
}
//And now we apply the matrix to our list
foreach (int[] mask in combinationMasks)
{
List<T> temporaryList = new List<T>(list);
//Executes the cycle in reverse order to avoid index out of bound
for (int iter = mask.Length - 1; iter >= 0; iter--)
{
//Whenever a 0 is found the correspondent item is removed from the list
if (mask[iter] == 0)
temporaryList.RemoveAt(iter);
}
yield return temporaryList;
}
}
}
So considering the example in the question:
# Row Dimension of 4 (list.Count)
Binary Numbers to 2^4
# Binary Matrix
0 0 0 1 => skip
0 0 1 0 => skip
[...]
0 1 1 1 => added // Text2;Text3;Text4
[...]
1 0 1 1 => added // Text1;Text3;Text4
1 1 0 0 => skip
1 1 0 1 => added // Text1;Text2;Text4
1 1 1 0 => added // Text1;Text2;Text3
1 1 1 1 => skip
Hope this can help someone :)
If you need clarification or you want to contribute feel free to add answers or comments (which one is more appropriate).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Speed up processing 32 bit numbers in combinations (k from n) - c#

Related

Nearest value from user input in an array C#

How to remove duplicate lines from a large text file efficiently?

How to list all possible combinations of a N value set, where each value is from a fixed subset of values using C#?

Combination Algorithm

How can I obtain all the possible combination of a subset?

Categories

Resources