I know you can convert the Int to a string and get the digit at position x using the indexer as if it was a char array, but this conversion becomes a bit of an overhead when you're dealing with multiple large numbers.
Is there a way to retrieve a digit at position x without converting the number to a string?
EDIT:
Thank you all, I will benchmark the proposed methods and check if it is any better than converting to a string. Thread will stay unanswered for 24h in case anyone has better ideas.
EDIT 2:
After some simple tests on ulong numbers, I have concluded that converting to strings and extracting the digit can be up to 50% slower compared to the methods provided below, see approved answer.
You could do something like this:
int ith_digit(int n, int i) {
return (int) (n / pow(10, i)) % 10;
}
We can get the ith digit by reducing the number down to a point where that digit we want becomes in the one's place, example:
Let's say you wanted the third digit in 12345, then by reducing it to 123 (by dividing it by 10 i number of times) we can then take the remainder of that number divided by ten to get the last digit, which is the digit we wanted.
There have been many questions but i can't seem to find the why in the answers. It's usually: no, replace this with this or this should work.
My task is to create a program that asks the user to input a 3 digit positive integer (decimal) that converts it to octal.
For example, on paper: To convert the number 112 to octal. (8 is the base number for octal.)
These are the steps you would take:
112 / 8 = 14 remainder = 0
14 / 8 = 1 remainder = 6
1 / 8 = 0 remainder = 1
Remainder from bottom to up is the octal number that represents 112 in decimal.
So the octal number for 112 is 160.
I found the following program on the internet but i don't understand it fully.
The comments in the program are mine. Could anyone explain it to me please?
//declaration and initialization of variables but why is there an array?
int decimalNumber, quotient, i = 1, j;
int[] octalNumber = new int[100];
//input
Console.WriteLine("Enter a Decimal Number :");
decimalNumber = int.Parse(Console.ReadLine());
quotient = decimalNumber;
//as long as quotient is not equal to 0, statement will run
while (quotient != 0)
{
//this is how the remainder is calculated but it is then put in an array + 1, i don't understand this.
octalNumber[i++] = quotient % 8;
//divide the number given by the user with the octal base number
quotient = quotient / 8;
}
Console.Write("Equivalent Octal Number is ");
//i don't understand the code below here aswell.
for (j = i - 1; j > 0; j--)
Console.Write(octalNumber[j]);
Console.Read();
Any help is truly appreciated.
The first thing to understand is: this is a terrible way to solve this problem. The code is full of odd choices; it looks like someone took a bad C solution of this problem and translated it to C# without applying careful thought or using good practices. If you are trying to learn how to understand crappy code you find on the internet, this is a great example. If you are trying to learn how to design good code, this is a great example of what not to do.
//declaration and initialization of variables but why is there an array?
There's an array because we wish to store all the octal digits, and an array is a convenient mechanism for storing a number of data of the same type.
But we could ask some more pertinent questions here:
Why of size 100? It's not wrong, but that's enormously larger than necessary. What thought process led to 100 being chosen? Why wasn't that thought process documented anywhere?
Why an array of int? We're outputting text, which is a sequence of chars. It would seem more natural to have a bunch of chars.
Why an array? Since we are building a first-in-last-out data structure, a stack seems more appropriate. Or why not simply accumulate a string? That's inefficient if the string is large, but an octal string from a 32 bit integer is never large!
Why does the program produce output to the console? Surely a better factored program would have a method that takes an int and returns an octal string, which can then be printed.
Why do some of the variables have descriptive names and some have undescriptive names? Is the author of the code deliberately trying to confuse the reader? Or did they simply not think about it very carefully?
Why does i - apparently the current index into the array -- start at one?! This is simply bizarre. Arrays start at zero in C#.
What happens if you type in a negative number? Try it!
What happens if you type in zero?
We then go on to:
decimalNumber = int.Parse(Console.ReadLine());
This code presumes that the typed-in text is a legal integer, which is not guaranteed. So this program can crash. TryParse should be used, and the failure mode should be handled.
// this is how the remainder is calculated but it is
// then put in an array + 1, i don't understand this.
octalNumber[i++] = quotient % 8;
The author of the code thinks they are being clever. This is too much cleverness. Rewrite the code in your head to how it should have been implemented in the first place. First, rename i to currentIndex. Next, produce one side effect per statement, not two:
while (quotient != 0)
{
octalNumber[currentIndex] = quotient % 8;
currentIndex += 1;
quotient = quotient / 8;
}
Now it should be clear what is going on.
// I don't understand the code below here as well.
for (j = i - 1; j > 0; j--)
Console.Write(octalNumber[j]);
Do a little example. Suppose the number is 14, which is 16 in octal. First time through the loop we put 6 in slot 1. Next time through, we put 1 in slot 2. So the array is {0, 6, 1, 0, 0, 0, 0 ... } and i is 3. We wish to output 16. So we loop j from i-1 to 1, and print out 1 then 6.
So, exercise for you: write this program again, this time using the conventions of a well-designed C# program. Put your attempt on the code review site and people will be happy to give you tips on how to improve it.
This is already built into .NET, Convert.ToString already does this.
In your code, just after you have decimalNumber = int.Parse(...) you can do this:
Console.WriteLine(Convert.ToString(decimalNumber, 8));
Console.Read();
and then remove the rest of the code.
Now, if you're not asking how to do octal conversion in .NET but actually how that code works, here's how it works:
This loop does the heavy lifting:
1 while (quotient != 0)
{
//this is how the remainder is calculated but it is then put in an array + 1, i don't understand this.
2 octalNumber[i++] = quotient % 8;
//divide the number given by the user with the octal base number
3 quotient = quotient / 8;
}
I added some numbers to the lines to make it easier writing a description.
Basically, the loop does this (lines above correspond to points below).
As long as we have a number to convert (ie. we're still not done), loop.
Figure out the least significant digit, this is the remainder after dividing by 8, which is handled by the remainder operator, %, store this digit into the array in the next position.
Divide by 8 to get rid of that least significant digit and move all the other digits one up
Then loop back.
However, since we essentially found all the digits from the rightmost side towards the left, the loop at the end writes them back out in their opposite order.
As an exercise to the reader, try to figure out how the code in the question behaves if you:
Input a negative number
Input 0
(hint, it doesn't behave correctly but Convert.ToString does)
An array is used because they are calculating each digit every interation of the while loop. (e.g.) {0, 6, 1}
The last part of the program is printing each digit out, starting with the last item in the array and moving to the first. in this case it would print out:
160
This question already has answers here:
Mapping two integers to one, in a unique and deterministic way
(19 answers)
Closed 7 years ago.
I need to find a way, such that user has to input 2 numbers (int) and for every different value a single output (int preferably!) is returned.
Say user enters 6, 8 it returns k when user enter anything else like 6,7 or 9,8 or any other input m, n except for 6, 8 (even if only one input is changed) a completely different output is produced. But the thing is, it should be unique for only that m, n so I cant use something like m*n because 6 X 4 = 24 but also, 12 X 2 = 24 so the output is not unique, so I need to find a way where for every different input, there is a totally different output that is not repeated for any other value.
EDIT: In response to Nicolas: the input range can be anything but will be less then 1000 (but more then 1 of course!)
EDIT 2: In response to Rawling, I can use long (Int64) but not preferably use float or doulbe, becuase this output will be used in a for loop, and float and double are terrible for for loop, you can check it here
Since your two numbers are less than 1000, you can do k = (1000 * x1) + x2 to get a unique answer. The maximum value would be 999999, which is well within the range of a 32-bit int.
You can always return a long: from two integers a and b, return 2^|INT_SIZE|*a + b
It is easy to see from pigeonhole principle, that given two ints, one cannot return a unique int for every different input. Explanation: If you have 2 numbers, each containing n bits, then there are 2^n possibilities for each number, and thus there are (2^n)^2 possible pairs, so from piegeonhole principle - you need at least lg_2((2^n)^2) = 2n bits to represent them,
EDIT: Your edit mentions the range of your numbers is [1,1000] - thus the same idea can be applied: 1000*a + b will generate a unique int for each pairs.
Note that for the same reasons, the range of the resulting integer must be [1,1000000] - or you will get clashes.
Because I don't have 50 posts to comment, I must say, there are functions
called Pairing Functions.
Pairing functions such as Cantor's Pairing Function(Shown on the previous link) and Szudzik's Pairing Function which allows the inputs to be infinite and still be able to provide an unique and deterministic output.
Here is another similar question on stackoverflow. (Great, I need 10 reputation to post more than two links..)
(http://) stackoverflow.com/questions/919612/mapping-two-integers-to-one-in-a-unique-and-deterministic-way
EDIT: I'm late.
If you didn't have a hard upper bound, you could do the following:
int Unique (int x, int y)
{
int n = x + y;
int t = (n%2==0) ? ((n/2) * (n+1)) : (n * ((n+1)/2));
return t + x;
}
Mathematically speaking, this will return a unique non negative integer for each (non-negative) pair of integers with no upper bound.
Programatically speaking, it will run into overflow problems, which could be overcome by using long instead of int for everything except the input variables.
The canonical mathematical solution is to use prime powers. As every number can be decomposed uniquely into its prime factors, returning 2^n * 3^m will give you different results for every n and m.
This can be extended to 2^n * 3^m * 5^a * 7^b *11^c and so on; you only need to check that you do not run out of 32-bit integers. If there is a risk of overflowing, you can take the remainder after dividing by a prime larger than your input range, and you will still have uniqueness.
I want to create a shuffled set of integers such that:
Given the same seed, the shuffle will be the same every time
As I iterate through, every number in the shuffled set will be used exactly once before repeating itself
Will work for large sets (I want all numbers between 0 and 2 billion)
Will generate between a range, for example, 100 to 150.
This option gives a great solution if you want, say, all of the numbers between 0 and a specified number: Generating Shuffled Range Using a PRNG Rather Than Shuffling
Any ideas?
You can use the exact same algorithm as the linked question. Just generate numbers between 0 and upperBound - lowerBound + 1 and add lowerBound to the result.
e.g. (using code from linked question):
var upper = 5;
var lower = 3;
foreach (int n in GenerateSequence(upper-lower+1))
{
Console.WriteLine(n+lower);
}
If you want the sequence to repeat (shuffled differently each time), you can add a while (true) around the iterator method body.
I have a large set of numbers, probably in the multiple gigabytes range. First issue is that I can't store all of these in memory. Second is that any attempt at addition of these will result in an overflow. I was thinking of using more of a rolling average, but it needs to be accurate. Any ideas?
These are all floating point numbers.
This is not read from a database, it is a CSV file collected from multiple sources. It has to be accurate as it is stored as parts of a second (e.g; 0.293482888929) and a rolling average can be the difference between .2 and .3
It is a set of #'s representing how long users took to respond to certain form actions. For example when showing a messagebox, how long did it take them to press OK or Cancel. The data was sent to me stored as seconds.portions of a second; 1.2347 seconds for example. Converting it to milliseconds and I overflow int, long, etc.. rather quickly. Even if I don't convert it, I still overflow it rather quickly. I guess the one answer below is correct, that maybe I don't have to be 100% accurate, just look within a certain range inside of a sepcific StdDev and I would be close enough.
You can sample randomly from your set ("population") to get an average ("mean"). The accuracy will be determined by how much your samples vary (as determined by "standard deviation" or variance).
The advantage is that you have billions of observations, and you only have to sample a fraction of them to get a decent accuracy or the "confidence range" of your choice. If the conditions are right, this cuts down the amount of work you will be doing.
Here's a numerical library for C# that includes a random sequence generator. Just make a random sequence of numbers that reference indices in your array of elements (from 1 to x, the number of elements in your array). Dereference to get the values, and then calculate your mean and standard deviation.
If you want to test the distribution of your data, consider using the Chi-Squared Fit test or the K-S test, which you'll find in many spreadsheet and statistical packages (e.g., R). That will help confirm whether this approach is usable or not.
Integers or floats?
If they're integers, you need to accumulate a frequency distribution by reading the numbers and recording how many of each value you see. That can be averaged easily.
For floating point, this is a bit of a problem. Given the overall range of the floats, and the actual distribution, you have to work out a bin-size that preserves the accuracy you want without preserving all of the numbers.
Edit
First, you need to sample your data to get a mean and a standard deviation. A few thousand points should be good enough.
Then, you need to determine a respectable range. Folks pick things like ±6σ (standard deviations) around the mean. You'll divide this range into as many buckets as you can stand.
In effect, the number of buckets determines the number of significant digits in your average. So, pick 10,000 or 100,000 buckets to get 4 or 5 digits of precision. Since it's a measurement, odds are good that your measurements only have two or three digits.
Edit
What you'll discover is that the mean of your initial sample is very close to the mean of any other sample. And any sample mean is close to the population mean. You'll note that most (but not all) of your means are with 1 standard deviation of each other.
You should find that your measurement errors and inaccuracies are larger than your standard deviation.
This means that a sample mean is as useful as a population mean.
Wouldn't a rolling average be as accurate as anything else (discounting rounding errors, I mean)? It might be kind of slow because of all the dividing.
You could group batches of numbers and average them recursively. Like average 100 numbers 100 times, then average the result. This would be less thrashing and mostly addition.
In fact, if you added 256 or 512 at once you might be able to bit-shift the result by either 8 or 9, (I believe you could do this in a double by simply changing the floating point mantissa)--this would make your program extremely quick and it could be written recursively in just a few lines of code (not counting the unsafe operation of the mantissa shift).
Perhaps dividing by 256 would already use this optimization? I may have to speed test dividing by 255 vs 256 and see if there is some massive improvement. I'm guessing not.
You mean of 32-bit and 64-bit numbers. But why not just use a proper Rational Big Num library? If you have so much data and you want an exact mean, then just code it.
class RationalBignum {
public Bignum Numerator { get; set; }
public Bignum Denominator { get; set; }
}
class BigMeanr {
public static int Main(string[] argv) {
var sum = new RationalBignum(0);
var n = new Bignum(0);
using (var s = new FileStream(argv[0])) {
using (var r = new BinaryReader(s)) {
try {
while (true) {
var flt = r.ReadSingle();
rat = new RationalBignum(flt);
sum += rat;
n++;
}
}
catch (EndOfStreamException) {
break;
}
}
}
Console.WriteLine("The mean is: {0}", sum / n);
}
}
Just remember, there are more numeric types out there than the ones your compiler offers you.
You could break the data into sets of, say, 1000 numbers, average these, and then average the averages.
This is a classic divide-and-conquer type problem.
The issue is that the average of a large set of numbers is the same
as the average of the first-half of the set, averaged with the average of the second-half of the set.
In other words:
AVG(A[1..N]) == AVG( AVG(A[1..N/2]), AVG(A[N/2..N]) )
Here is a simple, C#, recursive solution.
Its passed my tests, and should be completely correct.
public struct SubAverage
{
public float Average;
public int Count;
};
static SubAverage AverageMegaList(List<float> aList)
{
if (aList.Count <= 500) // Brute-force average 500 numbers or less.
{
SubAverage avg;
avg.Average = 0;
avg.Count = aList.Count;
foreach(float f in aList)
{
avg.Average += f;
}
avg.Average /= avg.Count;
return avg;
}
// For more than 500 numbers, break the list into two sub-lists.
SubAverage subAvg_A = AverageMegaList(aList.GetRange(0, aList.Count/2));
SubAverage subAvg_B = AverageMegaList(aList.GetRange(aList.Count/2, aList.Count-aList.Count/2));
SubAverage finalAnswer;
finalAnswer.Average = subAvg_A.Average * subAvg_A.Count/aList.Count +
subAvg_B.Average * subAvg_B.Count/aList.Count;
finalAnswer.Count = aList.Count;
Console.WriteLine("The average of {0} numbers is {1}",
finalAnswer.Count, finalAnswer.Average);
return finalAnswer;
}
The trick is that you're worried about an overflow. In that case, it all comes down to order of execution. The basic formula is like this:
Given:
A = current avg
C = count of items
V = next value in the sequence
The next average (A1) is:
(C * A) + V
A1 = ———————————
C + 1
The danger is over the course of evaulating the sequence, while A should stay relatively manageable C will become very large.
Eventually C * A will overflow the integer or double types.
One thing we can try is to re-write it like this, to reduce the chance of an overflow:
A1 = C/(C+1) * A/(C+1) + V/(C+1)
In this way, we never multiply C * A and only deal with smaller numbers. But the concern now is the result of the division operations. If C is very large, C/C+1 (for example) may not be meaningful when constrained to normal floating point representations. The best I can suggest is to use the largest type possible for C here.
Here's one way to do it in pseudocode:
average=first
count=1
while more:
count+=1
diff=next-average
average+=diff/count
return average
Sorry for the late comment, but isn't it the formula above provided by Joel Coehoorn rewritten wrongly?
I mean, the basic formula is right:
Given:
A = current avg
C = count of items
V = next value in the sequence
The next average (A1) is:
A1 = ( (C * A) + V ) / ( C + 1 )
But instead of:
A1 = C/(C+1) * A/(C+1) + V/(C+1)
shouldn't we have:
A1 = C/(C+1) * A + V/(C+1)
That would explain kastermester's post:
"My math ticks off here - You have C, which you say "go towards infinity" or at least, a really big number, then: C/(C+1) goes towards 1. A /(C+1) goes towards 0. V/(C+1) goes towards 0. All in all: A1 = 1 * 0 + 0 So put shortly A1 goes towards 0 - seems a bit off. – kastermester"
Because we would have A1 = 1 * A + 0, i.e., A1 goes towards A, which it's right.
I've been using such method for calculating averages for a long time and the aforementioned precision problems have never been an issue for me.
With floating point numbers the problem is not overflow, but loss of precision when the accumulated value gets large. Adding a small number to a huge accumulated value will result in losing most of the bits of the small number.
There is a clever solution by the author of the IEEE floating point standard himself, the Kahan summation algorithm, which deals exactly with this kind of problems by checking the error at each step and keeping a running compensation term that prevents losing the small values.
If the numbers are int's, accumulate the total in a long. If the numbers are long's ... what language are you using? In Java you could accumulate the total in a BigInteger, which is an integer which will grow as large as it needs to be. You could always write your own class to reproduce this functionality. The gist of it is just to make an array of integers to hold each "big number". When you add two numbers, loop through starting with the low-order value. If the result of the addition sets the high order bit, clear this bit and carry the one to the next column.
Another option would be to find the average of, say, 1000 numbers at a time. Hold these intermediate results, then when you're done average them all together.
Why is a sum of floating point numbers overflowing? In order for that to happen, you would need to have values near the max float value, which sounds odd.
If you were dealing with integers I'd suggest using a BigInteger, or breaking the set into multiple subsets, recursively averaging the subsets, then averaging the averages.
If you're dealing with floats, it gets a bit weird. A rolling average could become very inaccurate. I suggest using a rolling average which is only updated when you hit an overflow exception or the end of the set. So effectively dividing the set into non-overflowing sets.
Two ideas from me:
If the numbers are ints, use an arbitrary precision library like IntX - this could be too slow, though
If the numbers are floats and you know the total amount, you can divide each entry by that number and add up the result. If you use double, the precision should be sufficient.
Why not just scale the numbers (down) before computing the average?
If I were to find the mean of billions of doubles as accurately as possible, I would take the following approach (NOT TESTED):
Find out 'M', an upper bound for log2(nb_of_input_data). If there are billions of data, 50 may be a good candidate (> 1 000 000 billions capacity). Create an L1 array of M double elements. If you're not sure about M, creating an extensible list will solve the issue, but it is slower.
Also create an associated L2 boolean array (all cells set to false by default).
For each incoming data D:
int i = 0;
double localMean = D;
while (L2[i]) {
L2[i] = false;
localMean = (localMean + L1[i]) / 2;
i++;
}
L1[i] = localMean;
L2[i] = true;
And your final mean will be:
double sum = 0;
double totalWeight = 0;
for (int i = 0; i < 50) {
if (L2[i]) {
long weight = 1 << i;
sum += L1[i] * weight;
totalWeight += weight;
}
}
return sum / totalWeight;
Notes:
Many proposed solutions in this thread miss the point of lost precision.
Using binary instead of 100-group-or-whatever provides better precision, and doubles can be safely doubled or halved without losing precision!
Try this
Iterate through the numbers incrementing a counter, and adding each number to a total, until adding the next number would result in an overflow, or you run out of numbers.
( It makes no difference if the inputs are integers or floats - use the largest precision float you can and convert each input to that type)
Divide the total by the counter to get a mean ( a floating point), and add it to a temp array
If you had run out of numbers, and there is only one element in temp, that's your result.
Start over using the temp array as input, ie iteratively recurse until you reached the end condition described earlier.
depending on the range of numbers it might be a good idea to have an array where the subscript is your number and the value is the quantity of that number, you could then do your calculation from this