I want to find out what is the best way to do this in C#:
I have a array of lets say 20 numbers, and then one more additional variable.
I want to get the sum of the numbers which is closest to the given variable.
Lets say, I have 1.1, 1.5, 1.7, 1.9, 2.2, 3.1, 3.2, 1,5, 4.5, 4.1. And then the additional variable has value of 5.
I want to get the sum of some numbers in the array which will be closest to the given number, and once I'll get that number, remove those numbers from the list and add them to a new array.
Every comment is welcomed.
Thanks
You are describing the optimization problem for Subset Sum Problem.
The problem is NP-Complete, so there is no known polynomial solution to it.
However, since the input is fairly small scale - an exponential solution of checking all subsets is feasible, since there are only 2^20 ~= 1000000 (a bit more, actually, but close enough for estimating run time)
Pseudo code should be something like:
getClosestSum(list,sum,number):
if (list is empty):
return sum
candidate1 <- getClosest(list[1:],sum,number)
candidate2 <- getClosest(list[1:],sum+list[0],number)
if (abs(number-candidate1) < abs(number-candidate2)):
return candidate1
else:
return candidate2
Related
I want to generate random numbers within a range (1 - 100000), but instead of purely random I want the results to be based on a kind of distribution. What I mean that in general I want the numbers "clustered" around the minimum value of the range (1).
I've read about Box–Muller transform and normal distributions but I'm not quite sure how to use them to achieve the number generator.
How can I achieve such an algorithm using C#?
There are a lot of ways doing this (using uniform distribution prng) here few I know of:
Combine more uniform random variables to obtain desired distribution.
I am not a math guy but there sure are equations for this. This kind of solution has usually the best properties from randomness and statistical point of view. For more info see the famous:
Understanding “randomness”.
but there are limited number of distributions we know the combinations for.
Apply non linear function on uniform random variable
This is the simplest to implement. You simply use floating randoms in <0..1> range apply your non linear function (that change the distribution towards your wanted shape) on them (while result is still in the <0..1> range) and rescale the result into your integer range for example (in C++):
floor( pow( random(),5 ) * 100000 )
The problem is that this is just blind fitting of the distribution so you usually need to tweak the constants a bit. It a good idea to render histogram and randomness graphs to see the quality of result directly like in here:
How to seed to generate random numbers?
You can also avoid too blind fitting with BEZIERS like in here:
Random but most likely 1 float
Distribution following pseudo random generator
there are two approaches I know of for this the simpler is:
create big enough array of size n
fill it with all values following the distribution
so simply loop through all values you want to output and compute how many of them will be in n size array (from your distribution) and add that count of the numbers into array. Beware the filled size of the array might be slightly less than n due to rounding. If n is too small you will be missing some less occurring numbers. so if you multiply probability of the least probable number and n it should be at least >=1. After the filling change the n into the real array size (number of really filled numbers in it).
shuffle the array
now use the array as linear list of random numbers
so instead of random() you just pick a number from array and move to the next one. Once you get into n-th value schuffle the array and start from first one again.
This solution has very good statistical properties (follows the distribution exactly) but the randomness properties are not good and requires array and occasional shuffling. For more info see:
How to efficiently generate a set of unique random numbers with a predefined distribution?
The other variation of this is to avoid use of array and shuffling. It goes like this:
get random value in range <0..1>
apply inverse cumulated distribution function to convert to target range
as you can see its like the #2 Apply non linear function... approach but instead of "some" non linear function you use directly the distribution. So if p(x) is probability of x in range <0..1> where 1 means 100% than we need a function that cumulates all the probabilities up to x (sorry do not know the exact math term in English). For integers:
f(x) = p(0)+p(1)+...+p(x)
Now we need inverse function g() to it so:
y = f(x)
x = g(y)
Now if my memory serves me well then the generation should look like this:
y = random(); // <0..1>
x = g(y); // probability -> value
Many distributions have known g() function but for those that do not (or we are too lazy to derive it) you can use binary search on p(x). Too lazy to code it so here slower linear search version:
for (x=0;x<max;x++) if (f(x)>=y) break;
So when put all together (and using only p(x)) I got this (C++):
y=random(); // uniform distribution pseudo random value in range <0..1>
for (f=0.0,x=0;x<max;x++) // loop x through all values
{
f+=p(x); // f(x) cumulative distribution function
if (f>=y) break;
}
// here x is your pseudo random value following p(x) distribution
This kind of solution has usually very good both statistical and randomness properties and does not require that the distribution is a continuous function (it can be even just an array of values instead).
Is there a .net framework function to bin-bucket numerical values, for example for the sake of preparing data for an histogram chart?
I find it odd I might have to code one up myself.
Probably I am not browsing around with the right keyword.
I don't think there is a function that will automatically prepare data for a histogram (including the calculation of the right number of buckets), but you can quite easily create histograms using Seq.countBy.
For example, given a sequence of numbers nums between -1 and 1, you can write something like:
nums
|> Seq.countBy (fun v -> round(v*10.0))
This will create buckets for numbers in intervals ... (-0.1, 0.0), (0.0, 0.1), (0.1, 0.2), ... etc. and it will return the count of numbers in each bucket. If you pipe the result to the Chart.Bar function from F# Charting, then you'll get a reasonably nice histogram.
I'm working on a simple game and I have the requirement of taking a word or phrase such as "hello world" and converting it to a series of numbers.
The criteria is:
Numbers need to be distinct
Need ability to configure maximum sequence of numbers. IE 10 numbers total.
Need ability to configure max range for each number in sequence.
Must be deterministic, that is we should get the same sequence everytime for the same input phrase.
I've tried breaking down the problem like so:
Convert characters to ASCII number code: "hello world" = 104 101 108 108 111 32 119 111 114 108 100
Remove everyother number until we satisfy total numbers (10 in this case)
Foreach number if number > max number then divide by 2 until number <= max number
If any numbers are duplicated increase or decrease the first occurence until satisfied. (This could cause a problem as you could create a duplicate by solving another duplicate)
Is there a better way of doing this or am I on the right track? As stated above I think I may run into issues with removing distinction.
If you want to limit the size of the output series - then this is impossible.
Proof:
Assume your output is a series of size k, each of range r <= M for some predefined M, then there are at most k*M possible outputs.
However, there are infinite number of inputs, and specifically there are k*M+1 different inputs.
From pigeonhole principle (where the inputs are the pigeons and the outputs are the pigeonholes) - there are 2 pigeons (inputs) in one pigeonhole (output) - so the requirement cannot be achieved.
Original answer, provides workaround without limiting the size of the output series:
You can use prime numbers, let p1,p2,... be the series of prime numbers.
Then, convert the string into series of numbers using number[i] = ascii(char[i]) * p_i
The range of each character is obviously then [0,255 * p_i]
Since for each i,j such that i != j -> p_i * x != p_j * y (for each x,y) - you get uniqueness. However, this is mainly nice theoretically as the generated numbers might grow quickly, and for practical implementation you are going to need some big number library such as java's BigInteger (cannot recall the C# equivalent)
Another possible solution (with the same relaxation of no series limitation) is:
number[i] = ascii(char[i]) + 256*(i-1)
In here the range for number[i] is [256*(i-1),256*i), and elements are still distinct.
Mathematically, it is theoretically possible to do what you want, but you won't be able to do it in C#:
If your outputs are required to be distinct, then you cannot lose any information after encoding the string using ASCII values. This means that if you limit your output size to n numbers then the numbers will have to include all information from the encoding.
So for your example
"Hello World" -> 104 101 108 108 111 32 119 111 114 108 100
you would have to preserve the meaning of each of those numbers. The simplest way to do this would just 0 pad your numbers to three digits and concatenate them together into one large number...making your result 104101108111032119111114108100 for max numbers = 1.
(You can see where the issue becomes, for arbitrary length input you need very large numbers.) So certainly it is possible to encode any arbitrary length string input to n numbers, but the numbers will become exceedingly large.
If by "numbers" you meant digits, then no you cannot have distinct outputs, as #amit explained in his example with the pidgeonhole principle.
Let's eliminate your criteria as easily as possible.
For distinct, deterministic, just use a hash code. (Hash actually isn't guaranteed to be distinct, but is highly likely to be):
string s = "hello world";
uint hash = Convert.ToUInt32(s.GetHashCode());
Note that I converted the signed int returned from GetHashCode to unsigned, to avoid the chance of having a '-' appear.
Then, for your max range per number, just convert the base.
That leaves you with the maximum sequence criteria. Without understanding your requirements better, all I can propose is truncate if necessary:
hash.toString().Substring(0, size)
Truncating leaves a chance that you'll no longer be distinct, but that must be built in as acceptable to your requirements? As amit explains in another answer, you can't have infinite input and non-infinite output.
Ok, so in one comment you've said that this is just to pick lottery numbers. In that case, you could do something like this:
public static List<int> GenNumbers(String input, int count, int maxNum)
{
List<int> ret = new List<int>();
Random r = new Random(input.GetHashCode());
for (int i = 0; i < count; ++i)
{
int next = r.Next(maxNum - i);
foreach (int picked in ret.OrderBy(x => x))
{
if (picked <= next)
++next;
else
break;
}
ret.Add(next);
}
return ret;
}
The idea is to seed a random number generator with the hash code of the String. The rest of that is just picking numbers without replacement. I'm sure it could be written more efficiently - an alternative is to generate all maxNum numbers and shuffle the first count. Warning, untested.
I know newer versions of the .Net runtime use a random String hash code algorithm (so results will differ between runs), but I believe this is opt-in. Writing your own hash algorithm is an option.
This might be more Math related than C#, but I need a C# solution so I'm putting it here.
My question is about the probability of random number generators, more specifically if each possible value is returned with an equal probability.
I know there is the Random.Next(int, int) method which returns a number between the first integer and last (with the last being exclusive).
Random.Next() [without overloads] will return a value between 0 and Int32.MaxValue (which is 2147483647) - 1, so 2147483646.
If I want a value between 1 and 10, I could call Random.Next(1, 11) to do this, however does every value between 1 and 10 have an equal probability of occuring?
For example, the range is 10, so 2147483646 is not perfectly divisible by 10, so the values 1-6 have a slightly higher probability of occuring (because 2147483646 % 10 = 6). This is of course assuming that every value within Random.Next() [without overloads] returns a value between 0 and 2147483646 with equal probability.
How would one ensure that every number within a range has an equal probability of occuring? Let's say for a lottery type system where it would be unfair for some people to have a higher probility than others, I'm not saying I would use the C# built in RNG for this, I was just using it as an example.
I note that no one actually answered the meaty question in your post:
For example, the range is 10, so 2147483646 is not perfectly divisible by 10, so the values 1-6 have a slightly higher probability of occuring (because 2147483646 % 10 = 6). This is of course assuming that every value within Random.Next() [without overloads] returns a value between 0 and 2147483646 with equal probability.
How would one ensure that every number within a range has an equal probability of occuring?
Right, so you just throw out the values that cause the imbalance. For example, let's say that you had a RNG that could produce a uniform distribution over { 0, 1, 2, 3, 4 }, and you wanted to use it to produce a uniform distribution over { 0, 1 }. The naive implementation is: draw from {0, 1, 2, 3, 4} and then return the value % 2; this, however, would obviously produce a biased sample. This happens because, as you note, 5 (the number of items) is not evenly divisible by 2. So, instead, throw any draws that produce the value 4. Thus, the algorithm would be
draw from { 0, 1, 2, 3, 4 }
if the value is 4, throw it out
otherwise, return the value % 2
You can use this basic idea to solve the general problem.
however does every value between 1 and 10 have an equal probability of occuring?
Yes, it does. From MSDN:
Pseudo-random numbers are chosen with equal probability from a finite set of numbers.
Edit: Apparently the documentation is NOT consistent with the current implementation in .NET. The documentation states the draws are uniform, but the code suggests that it is not. However, that does NOT negate the fact that this is a soluble problem, and my approach is one way to solve it.
The C# built in RNG is, as you expect, a uniformly distributed one. Every number has an equal likelihood of occurring given the range you specify for Next(min, max).
You can test this yourself (I have) by taking, say, 1M samples and storing how many times each number actually appears. You'll get an almost flat-line curve if you graph it.
Also note that, each number having an equal likelihood doesn't mean that each number will occur the same amount of times. If you're looking at random numbers from 1 to 10, in 100 iterations, it won't be an even distribution of 10x occurrence for each number. Some numbers may occur 8 times, and others 12 or 13 times. However, with more iterations, this tends to even out somewhat.
Also, since it's mentioned in the comments, I'll add: if you want something stronger, look up cryptographic PRNGs. Mersenne Twister is particularly good from what I've seen (fast, cheap to compute, huge period) and it has open-source implementations in C#.
Test program:
var a = new int[10];
var r = new Random();
for (int i = 0; i < 1000000; i++) a[r.Next(1, 11) - 1]++;
for (int i = 0; i < a.Length; i++) Console.WriteLine("{0,2}{1,10}", i + 1, a[i]);
Output:
1 99924
2 100199
3 100568
4 100406
5 100114
6 99418
7 99759
8 99573
9 100121
10 99918
Conclusion:
Each value is returned with an equal probability.
Ashes and dtb are incorrect: You are right to suspect that some numbers would have a greater chance of occurring than others.
When you call .Next(x, y), there are y - x possible return values. The .NET 4.0 Random class calculates a return value based on the return value of NextDouble() (this is a slightly simplified description).
Obviously, the set of possible double values is finite, and, as you note, it may not be a multiple of the size of the set of possible return values of .Next(x, y). Therefore, assuming that the set of input values is uniformly distributed, some output values will have a slightly greater probability of occurring.
I don't know off hand how many numeric double values there are (i.e., excluding infinity and NaN values), but it is certainly larger than 2^32. In your case, if we assume 2^32 values, for the sake of argument, then we have to map 4294967296 inputs to 10 outputs. Some values would have a 429496730 / 429496729 greater probability of occurring, or 0.00000023283064397913028110629 percent greater. In fact, since the number of input states is greater than 2^32, the difference in probability would be even smaller.
I am using C# and have a list of int numbers which contains different numbers such as {34,36,40,35,37,38,39,4,5,3}. Now I need a script to find the different ranges in the list and write it on a file. for this example it would be: (34-40) and (3-5). What is the quick way to do it?
thanks for the help in advance;
The easiest way would be to sort the array and then do a single sequential pass to capture the ranges. That will most likely be fast enough for your purposes.
Two techniques come to mind: histogramming and sorting. Histogramming will be good for dense number sets (where you have most of the numbers between min and max) and sorting will be good if you have sparse number sets (very few of the numbers between min and max are actually used).
For histogramming, simply walk the array and set a Boolean flag to True in the corresponding position histogram, then walk the histogram looking for runs of True (default should be false).
For sorting, simply sort the array using the best applicable sorting technique, then walk the sorted array looking for contiguous runs.
EDIT: some examples.
Let's say you have an array with the first 1,000,000 positive integers, but all even multiples of 191 are removed (you don't know this ahead of time). Histogramming will be a better approach here.
Let's say you have an array containing powers of 2 (2, 4, 8, 16, ...) and 3 (3, 9, 27, 81, ...). For large lists, the list will be fairly sparse and sorting should be expected to do better.
As Mike said, first sort the list. Now, starting with the first element, remember that element, then compare it with the next one. If the next element is 1 greater than the current one, you have a contiguous series. Continue this until the next number is NOT contiguous. When you reach that point, you have a range from the first remembered value to the current value. Remember/output that range, then start again with the next value as the first element of a new series. This will execute in roughly 2N time (linear).
I would sort them and then check for consecutive numbers. If the difference > 1 you have a new range.