Divide a list of string into groups randomly

Divide a list of string into groups randomly - c#

Given a list of string of n item, I wish to divide it to b groups (b<=n) where each group has i to j (j>=i) items
An example:
Say
List<string> lst=new List<string>(new string[]{"a","b","c","d"});
(Therefore n=4)
Assume the function that provide this functionality is
List<List<string>> DivideIntoGroup(List<string> lst, b, i, j)
one of the possible result of DivideIntoGroup(lst, 3, 1, 2) is
{"a"},
{"b","c"},
{"d"}
How should I write the DivideIntoGroup functions?

I am not a C# expert so I will give you a purely mathematical solution, and hopefully you will be able to translate it in your language.
Basically your task consists of two separate parts: choose b groups i to j elements each, and randomness. The second should be easy - just random shuffle the elements initially and then do the group splitting. Lets get down to the interesting part:
How to split n elements in b groups containing i to j elements each?
A straight forward solution will be to take random number between iand j for the number of elements of the first group, then the second etc. However, there will be no guarantee, that doing so you will not be left with the last group having element number not between i and j. Also such solution is not doing pure random distribution.
The correct approach will be to get the number of elements of the first group, respecting the probability of solution of the overall group splitting when you take as many elements - you basically are interested how many solutions are overall for the task(n, b, i, j) and how many will exist for the task(n-k, b-1, i, j) if we assume we take k elements in the first group. If we are able to calculate just the number of solutions, you can take each k with its respective probability and do random sampling of k for the first group, then the second and so on...
So now the question is: how many solutions are there for task(n, b, i, j)?
Noting the fact that task(n, b, i, j) = sum(k=i to j) task(n-k, b - 1, i, j) you can find these numbers easily using recursion (use dynamic optimization, so that you need not calculate the values more than once).
PS: There might be a closed form solution for the number of solutions, but I can't figure it out right away and as long as n * b is kept relatively small (< 10^6) recursive solution should work.
EDIT
PS2: actually the numbers in task(n, b, i, j) might get pretty large very fast, so consider using big integers.

What I would do as a solution is so, this is pseudo code of course:
func( n, b, i, j )
{
if(n == 0)
return //finished
if(i>j or i>min(j,n))
return //no solution possible down this path
out = choose_random_between (i , min(j,n))
current_ave_of_cells_per_group = ( (n - out) / (b - 1) )
if current_ave_of_cells_per_group < i
func ( n, b, i, min(out-1,n) )
else if current_ave_of_cells_per_group > j
func ( n, b, out+1, min(j,n) )
else
**form the group consisting of 'out' numbers**
func ( n-out, b-1, i, min(j,n-out) )
}

Related

Print all partitions into disjoint combinations of fixed size

I have an array of numbers from 1 to n, and I need to find all possible partitions into disjoint combinations of 3 numbers.
That is, for n = 9 the situation is as follows:
Array: 1, 2, 3, 4, 5, 6, 7, 8, 9;
Possible combinations of 3: 123, 124 ... 245, 246 ... 478, 479, etc .;
Possible partitions into 3 disjoint combinations: 123 456 789, 123 457 689 ... 123 468 579 ... 127 458 369, etc.
I've found an algorithm for finding combinations of 3 numbers from a set, here it is: https://www.geeksforgeeks.org/print-all-possible-combinations-of-r-elements-in-a-given-array-of-size-n / (there are even 2 of them, but I used the first one). Now the question is how to find combinations of the combinations themselves, and this already causes difficulties: it seems to me that for this I need to deal with recursion again, but how and where exactly to use it, I don't fully understand (and perhaps the point is then another). Also I've seen a non-recursive algorithm that finds all the combinations from given numbers, https://rosettacode.org/wiki/Combinations#C.23, but could do nothing with it (I enclose my work with it). Could you please help me?
public static IEnumerable<int[]> Combinations(int[] a, int n, int m)
{
int[] result = new int[m];
Stack<int> stack = new Stack<int>();
stack.Push(0);
while (stack.Count > 0)
{
int index = stack.Count - 1;
int value = stack.Pop();
while (value < n)
{
result[index++] = ++value;
stack.Push(value);
if (index == m)
{
for (int i = 0; i < 3; i++)
{
a = a.Where(val => val != result[i]).ToArray();
}
return Combinations (a, n-3, m);
break;
}
}
}
}

Assuming n is a multiple of 3, there is a simple and intuitive recursive algorithm. (Writing it efficiently is a bit more of a challenge :-) ).
In pseudocode, generalising 3 to k:
# A must have a multiple of k elements
# I write V \ C to mean "V without the values in C". Since producing
# copies is expensive, you should find a more efficient way of doing
# this.
Partition(A, k):
If A has k elements, produce the partition consisting only of A
Otherwise:
Let m be the smallest element of A.
For each combination C of k-1 elements from A \ [m]:
Add m to C
For each partition P generated by Partition(A \ C, k):
produce P with the addition of C
Of course, that depends on you having access to an algorithm which can enumerate the k-combinations of a list. (Even better would be a function which produced successive shuffles of the list with different k-combinations at the beginning, while maintaining the list in order. Sadly, few standard libraries provide that.)
There's another recursive algorithm, which can easily be made into an iterative algorithm by maintaining an explicit stack. It's possibly not quite as intuitive, although once you see it, how it works is pretty obvious, but it's a lot easier to implement efficiently. It requires us to maintain the invariants that each set in the partition is stored in increasing order, and that the sets themselves are sorted in increasing order by their first element. (The order itself is irrelevant, and it's totally reasonable to just assume that the the original order of the elements is the desired sortation, as long as the elements are kept in a data structure whose ordering is constant.)
Once you establish that rule, you can start by making all the partition's sets empty, and then place each successive element in order, using each of the possible locations which obey the following simple constraints:
Once a set contains the correct number of elements, no more elements can be added to it.
Each element is placed at the end of a set (because all the already-placed elements are smaller and all the elements yet to be placed are bigger);
An element can only be added to an empty set if it is the first empty set in the partition (to guarantee that the sets themselves will be sorted).
To avoid constantly copying the sets in the partition, you can implement this by using a fixed-size two-dimensional array of k rows and n ⁄ k columns, where each row represents one set in the partition; it's then necessary to keep another array of n ⁄ k integers the current length of each set.
One advantage of the first algorithm is that it makes it reasonably obvious how many possible partitions there are, because the number of partitions generated by the inner loop is independent of the combination chosen in the outer loop. Consequently, if we write P(n, k) for the number of k-partitions of n objects, we can see that
P(n, k) = C(n−1, k−1) × P(n−k, k) for n>0   (where C(n, k) is the binomial coefficient)
That's simply a product of binomial coefficients:
P(n, k) = C(n−1, k−1) × C(n−k−1, k−1) × C(n−2k−1, k−1) × … × C(k−1, k−1)
Since C(n, k) is n! ⁄ k!(n−k)!, that can be simplified to n! ⁄ (d! × k!d) where d is the number of sets in each partition, i.e. d = n ⁄ k. That number is obviously a lot smaller than n! but it still grows extremely rapidly, making large arguments to the partition function impractical. For k=3, the first few counts are:
P( 3, 3) = 1
P( 6, 3) = 10
P( 9, 3) = 280
P(12, 3) = 15,400
P(15, 3) = 1,401,400
P(18, 3) = 190,590,400
P(21, 3) = 36,212,176,000
For this reason, it's usually advisable to generate and use the possible values one at a time rather than attempting to stash them all into a massive vector, which would take up a lot of memory.

Does C# have a builtinway of converting binary number to array of bit powers ( e.g. 0b1101 -> {0, 2, 3} )?

(Edit - preface: I am implementing an iterable through all subsets of given size.
For getting the next combination, I am using Gosper's hack to quickly get the 0/1 vector of lexicographically next combination. Now I need to quickly map the vector of combination to the array of elements from my set. Luckily, the elements are the very same as the powers of individual bits, and I am wondering if C# doesn't have fast shortcut for that.)
If I get K-th subset (in lexicographical order) of numbers 0 - (N-1), the bits in binary representation of K are telling me which elements should I choose. What is the most elegant way of checking which bits are set and making a subset (array) based on these?
Something like:
var BitPowers = new List<int>();
for(int i = 0; i<N; ++i)
{
if((K & (1<<i)) != 0)
{
BitPowers.Add(i);
}
}
return BitPowers.ToArray();
would probably suffice, but is it the best way? I guess bit operations are quick, but as the number of possible sets is exponential, optimizing this function as much as I can would be ideal.

There is no .NET builtin API to do such thing, as I know.
The Linq magic
You can write this compact code in one assignment, but less optimized for speed:
int value = 0b10110010;
var BitPowers = Convert.ToString(value, 2)
.Reverse()
.Select((bit, index) => new { index, bit })
.Where(v => v.bit == '1').Select(v => v.index);
foreach ( int index in BitPowers )
Console.WriteLine(index);
It converts the integer to a binary representation of a string, inverted to have good indexes from left to right, then it selects a pair of (bit, index), then it filters on those that are defined, then it selects the indexes for create the enumerable list.
Output
1
4
5
7
Compromise between elegance and speed
You can simplify your loop using a BitArray instance.
Perhaps it is the closest "builtin" way you ask for:
var bits = new BitArray(BitConverter.GetBytes(value));
for ( int index = 0; index < bits.Length; index++ )
if ( bits[index] )
BitPowers.Add(index);

Find remaining elements in the sequence

everyone. I've this small task to do:
There are two sequences of numbers:
A[0], A[1], ... , A[n].
B[0], B[1], ... , B[m].
Do the following operations with the sequence A:
Remove the items whose indices are divisible by B[0].
In the items remained, remove those whose indices are divisible by B[1].
Repeat this process up to B[m].
Output the items finally remained.
Input is like this: (where -1 is delimiter for two sequences A and B)
1 2 4 3 6 5 -1 2 -1
Here goes my code (explanation done via comments):
List<int> result = new List<int>(); // list for sequence A
List<int> values = new List<int>(); // list for holding value to remove
var input = Console.ReadLine().Split().Select(int.Parse).ToArray();
var len = Array.IndexOf(input, -1); // getting index of the first -1 (delimiter)
result = input.ToList(); // converting input array to List
result.RemoveRange(len, input.Length - len); // and deleting everything beyond first delimiter (including it)
for (var i = len + 1; i < input.Length - 1; i++) // for the number of elements in the sequence B
{
for (var j = 0; j < result.Count; j++) // going through all elmnts in sequence A
{
if (j % input[i] == 0) // if index is divisible by B[i]
{
values.Add(result[j]); // adding associated value to List<int> values
}
}
foreach (var value in values) // after all elements in sequence A have been looked upon, now deleting those who apply to criteria
{
result.Remove(value);
}
}
But the problem is that I'm only passing 5/11 tests cases. The 25% is 'Wrong result' and the rest 25% - 'Timed out'. I understand that my code is probably very badly written, but I really can't get to understand how to improve it.
So, if someone more experienced could explain (clarify) next points to me it would be very cool:
1. Am I doing parsing from the console input right? I feel like it could be done in a more elegant/efficient way.
2. Is my logic of getting value which apply to criteria and then storing them for later deleting is efficient in terms of performance? Or is there any other way to do it?
3. Why is this code not passing all test-cases or how would you change it in order to pass all of them?

I'm writing the answer once again, since I have misunderstood the problem completely. So undoubtly the problem in your code is a removal of elements. Let's try to avoid that. Let's try to make a new array C, where you can store all the correct numbers that should be left in the A array after each removal. So if index id is not divisible by B[i], you should add A[id] to the array C. Then, after checking all the indices with the B[i] value, you should replace the array A with the array C and do the same for B[i + 1]. Repeat until you reach the end of the array B.
The algorithm:
1. For each value in B:
2. For each id from 1 to length(A):
3. If id % value != 0, add A[id] to C
4. A = C
5. Return A.
EDIT: Be sure to make a new array C for each iteration of the 1. loop (or clear C after replacing A with it)

Comparing 1 million integers in an array without sorting it first

I have a task to find the difference between every integer in an array of random numbers and return the lowest difference. A requirement is that the integers can be between 0 and int.maxvalue and that the array will contain 1 million integers.
I put some code together which works fine for a small amount of integers but it takes a long time (so long most of the time I give up waiting) to do a million. My code is below, but I'm looking for some insight on how I can improve performance.
for(int i = 0; i < _RandomIntegerArray.Count(); i++) {
for(int ii = i + 1; ii < _RandomIntegerArray.Count(); ii++) {
if (_RandomIntegerArray[i] == _RandomIntegerArray[ii]) continue;
int currentDiff = Math.Abs(_RandomIntegerArray[i] - _RandomIntegerArray[ii]);
if (currentDiff < lowestDiff) {
Pairs.Clear();
}
if (currentDiff <= lowestDiff) {
Pairs.Add(new NumberPair(_RandomIntegerArray[i], _RandomIntegerArray[ii]));
lowestDiff = currentDiff;
}
}
}
Apologies to everyone that has pointed out that I don't sort; unfortunately sorting is not allowed.

Imagine that you have already found a pair of integers a and b from your random array such that a > b and a-b is the lowest among all possible pairs of integers in the array.
Does an integer c exist in the array such that a > c > b, i.e. c goes between a and b? Clearly, the answer is "no", because otherwise you'd pick the pair {a, c} or {c, b}.
This gives an answer to your problem: a and b must be next to each other in a sorted array. Sorting can be done in O(N*log N), and the search can be done in O(N) - an improvement over O(N2) algorithm that you have.

As per #JonSkeet try sorting the array first and then only compare consecutive array items, which means that you only need to iterate the array once:
Array.Sort(_RandomIntegerArray);
for (int i = 1; i < _RandomIntegerArray.Count(); i++)
{
int currentDiff = _RandomIntegerArray[i] - _RandomIntegerArray[i-1];
if (currentDiff < lowestDiff)
{
Pairs.Clear();
}
if (currentDiff <= lowestDiff)
{
Pairs.Add(new NumberPair(_RandomIntegerArray[i], _RandomIntegerArray[i-1]));
lowestDiff = currentDiff;
}
}
In my testing this results in < 200 ms elapsed for 1 million items.

You've got a million integers out of a possible 2.15 or 4.3 billion (signed or unsigned). That means the largest possible min distance is either about 2150 or 4300. Let's say that the max possible min distance is D.
Divide the legal integers into groups of length D. Create a hash h keyed on integers with arrays of ints as values. Process your array by taking each element x, and adding it to h[x/D].
The point of doing this is that any valid pair of points is either contained in h(k) for some k, or collectively in h(k) and h(k+1).
Find your pair of points by going through the keys of the hash and checking the points associated with adjacent keys. You can sort if you like, or use a bitvector, or any other method but now you're dealing with small arrays (on average 1 element per array).

As elements of the array are b/w 0 to int.maxvalue, so I suppose maxvalue will be less than 1 million. If it is so you just need to initialise the array[maxvalue] to 0 and then as you read 1 million values increment the value in your array.
Now read this array and find the lowest value as described by others as if all the values were sorted. If at any element is present more than 1 than its value will be >1 so you could easily say that min. difference is 0.
NOTE- This method is efficient only if you do not use sorting and more importantly int.maxvalue<<<<<(less than) 10^6(1 million).

It helps a little if you do not count on each iteration
int countIntegers = _RandomIntegerArray.Count();
for(int i = 0; i < countIntegers; i++) {
//...
for(int ii = i + 1; ii < countIntegers; ii++) {
//...
Given that Count() is only returning the number of Ints in an array on each successful count and not modifying the array or caching output until modifications.

How about splitting up the array into arraysize/number of processors sized chunks and running each chunk in a different thread. (Neil)
Assume three parts A, B and C of size as close as possible.
For each part, find the minimum "in-part" difference and that of pairs with the first component from the current part and the second from the next part (A being the next from C).
With a method taking O(n²) time, n/3 should take one ninth, done 2*3 times, this amounts to two thirds plus change for combining the results.
This calls to be applied recursively - remember Карацу́ба/Karatsuba multiplication?
Wait - maybe use two parts after all, for three fourth of the effort - very close to "Karatsuba". (When not seeing how to use an even number of parts, I was thinking multiprocessing with every processor doing "the same".)

Order a list by a field then by random

var r = new Random();
var orderedList = aListOfPeople.OrderBy(x => x.Age).ThenBy(x => r.Next());
What would be a better way of ordering a list by "age" and then by random?
My goal is to make sure that if PersonA age = PersonB age, PersonA will come first on some occasions and PersonB will come first on some other occasions.

Using the technique from SQL
var orderedList = aListOfPeople.OrderBy(x => x.Age).ThenBy(x => Guid.NewGuid());
Warning: it is not a true random, just a lazy approach, please refer to comment section of the question

The simplest answer is to shuffle and then sort. If you use a stable sort then the sort must preserve the shuffled order for equal-keyed values. However, even though an unstable sort will perturb your shuffle I can't think of any reasonable case in which it could un-shuffle equal-keyed values.
That might be a little inefficient, though...
If you're concerned about collisions I might assume your ages are integer ages in years. In that case you might consider a radix sort (256 bin will be enough for any living human), and when it comes time to stitch the bins back together, you would remove elements from each bin in random order as you append them to the list.
If your list is already sorted by age, and you just want to shuffle in-place, then you just need to iterate through the list, count how many of the following elements are equal, perform an in-place shuffle of that many elements, and then advance to the next non-matching element and repeat.
The latter would be something like this, I think (I'll write in C because I don't know C#):
int i = 0;
while (i < a.length) {
int j;
while (a[j] == a[i] && j < a.length) j++;
while (i + 1 < j) {
int k = random(j - i) + i;
swap(a[i], a[k]);
i++;
}
i++;
}
Haven't tested it, but it should give the rough idea.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.