Get individual digits from an Int without using strings? - c#

I know you can convert the Int to a string and get the digit at position x using the indexer as if it was a char array, but this conversion becomes a bit of an overhead when you're dealing with multiple large numbers.
Is there a way to retrieve a digit at position x without converting the number to a string?
EDIT:
Thank you all, I will benchmark the proposed methods and check if it is any better than converting to a string. Thread will stay unanswered for 24h in case anyone has better ideas.
EDIT 2:
After some simple tests on ulong numbers, I have concluded that converting to strings and extracting the digit can be up to 50% slower compared to the methods provided below, see approved answer.

You could do something like this:
int ith_digit(int n, int i) {
return (int) (n / pow(10, i)) % 10;
}
We can get the ith digit by reducing the number down to a point where that digit we want becomes in the one's place, example:
Let's say you wanted the third digit in 12345, then by reducing it to 123 (by dividing it by 10 i number of times) we can then take the remainder of that number divided by ten to get the last digit, which is the digit we wanted.

Related

Are there binary way to convert integer hundred position to zero?

I'm trying to make a function HundredPosToZero to convert hundred position to zero, for example :
HundredPosToZero(4239) // 4039
This is my implement :
public HundredPosToZero(int num){
return num / 1000 * 1000 + num % 100;
}
However, I'm thinking of why not use bitwise operator like 4239 & 1011 to do the same thing? But I can not figure out how to implement it since 4239 is not a binary, any advice with this approach?
Not really. There is nothing special about decimal hundreds in binary. It would be possible with e.g. a hexadecimal number, but decimal numbers don't play well with binary :)
It is simply impossible if you want this operation to be the same for all numbers as "hundreds" in regular representation of integers don't occupy the same set of bits.
If you use some other binary representation of numbers (like BCD) that allocates groups of bits to unique decimal digits then you can do that easily.

How to get the length of a Float? (C#)

I'm learning C# from the 'Fundamentals of Computer Programming with C#' by Svetlin Nakov and others (available for free here: http://www.introprogramming.info/english-intro-csharp-book/)
After each chapter, the authors like to ask questions that go beyond the scope of the chapter. Pg 135, Question 3 asks me to write a program that can correctly compare two real numbers with an accuracy of 0.000001 (7 significant digits).
So I'm using floats to compare the numbers and I decided to add some code that would check to see if the numbers entered are longer than the 7 significant digits that floats can handle. So I need to check for the number of significant digits. Google tells me that I should use sizeof(float) to do that, but I keep getting CS0246 error on the lines of the sizeof check (The type or namespace could not be found.)
The program works if I don't include the code that checks the length of the numbers. I can't find a answer on SO for C#.
What's the problem?
Edit: Thanks for all the answers. Let me clarify my question: I understand that parsing for string to float automatically checks for validity. However, I tried with my program yesterday, and floats will lose any more than 7 significant digits. So if I compare 0.123457 and 0.12345678, the program will declare that the two numbers are the same because the second number is rounded up. That's why I'm trying to catch for floats longer than 7 digits. I interpret the question this way because it occured to me that these two very similar, but not identical numbers slip through the cracks.
using System;
// Compare two real numbers with up to 0.000001 (7) significant digits
class Compare_Numbers
{
static void Main(string[] args)
{
// Processing the first number
String firstNumString = null;
Console.WriteLine("This program compares 2 numbers with upto 7 significant digits.\nEnter the FIRST number with up to 7 significant digits");
firstNumString = Console.ReadLine();
float firstNum = Single.Parse(firstNumString);
if (sizeof(firstNum) > 7)
{
Console.WriteLine("That number is too long!\nEnter a number with a MAX of 7 significant digits!");
}
// Processing the second number
String secondNumString = null;
Console.WriteLine("Enter the SECOND number with up to 7 significant digits");
secondNumString = Console.ReadLine();
float secondNum = Single.Parse(secondNumString);
if (sizeof(secondNum) > 7)
{
Console.WriteLine("That number is too long!\nEnter a number with a MAX of 7 significant digits!");
}
if (firstNum == secondNum)
{
Console.WriteLine("The two numbers are the SAME!");
}
else
{
Console.WriteLine("The two numbers are DIFFERENT!");
}
}
}
static private int GetNumDigitsInFloat(float n)
{
string s = n.ToString();
return s.Length - 1;
}
"How to get the length of a Float?"
In short, assuming 7 significant digits:
firstNum.ToString("G7").Length // 7 significant digits
Ex.
float pi = 3.14159265f;
string g5 = a.ToString("G5");
string g7 = a.ToString("G7");
However, your title asks something simple, but the body of your question indicates something else. So it appears that you think finding the length of a float is en route to a larger solution. I am not sure, so I will just try to point out several issues.
First, you are misusing sizeof; sizeof() in C# takes a type, not a variable. (So sizeof(float) would work, but not sizeof(num) ).
In any case, sizeof it isn't going to give you the number of significant digits. It will give you the number of bytes for the storage of the unmanaged type, which will be constant (4, 8, etc.). Instead, for a given string, use String.Length
However, what you can't do is try to parse the number to a float, and then try to check for out of range values by checking the float variable. By definition, if you can successfully parse to a float, then it was valid. If it is invalid, it won't parse. The part of your example where you use Single.Parse() and then proceed to try validating using the float variable is moot. You need to validate the string value, or validate that the parse succeeds, or change your approach.
I think the simplest solution is to just use Single.TryParse() and check the boolean return value. if it returns false, either the string value is invalid, or out of range of Single.MinValue and Single.MaxValue. So you might rethink your approach (since it isn't the author's original challenge). I, personally would use the large type in C# for my calculator, but the purpose of the exercise might be to learn these tangential issues, so:
If you already have a single precision (float), then you can get the length by converting to a string using Single.ToString() or string.Format() and using string.Length on the result, though it will include the decimal point, so account for that.
See the ToString() and format specifiers at:
http://msdn.microsoft.com/en-us/library/fzeeb5cd(v=vs.110).aspx
http://msdn.microsoft.com/en-us/library/0c899ak8(v=vs.110).aspx
The problem with this is by the time you use ToString() you already have a valid float and by that time the check is moot. You need to check the original value. See (2)
If you are starting from a string (which in this sample you are reading from console to string, then parsing with Single.Parse(), then you will either get a valid value, or an exception. You need to use a try-catch block with Single.Parse(), otherwise switch to Single.TryParse() and check the boolean return value.
Lastly, if you want to ensure you can both parse the value, as well as validate numbers of greater precision or range, you may add a Double.TryParse() as well.
if(!Single.TryParse(str, ref snum)) {
if(Double.TryParse(str, ref dnum))
// valid number is out of range for Single
else
// valid number is out of range for Double, or invalid
}
Or you may use Single.Parse and catch(OverflowException)
try {
snum = Single.Parse(str);
}
catch(OverflowException e) {
}
All that was said regarding your actual question, but the spirit of the problem is how to compare 2 valid numbers, in my opinion. In that case, use TryParse() to valid them, then just compare them directly, or use approaches given in #drf's answer https://stackoverflow.com/a/24482343/257090
If the intent is to ensure that two floating-point numbers are approximately equal within 1E-7, there is no need to convert to a string at all. In fact, using string conversions is probably not a robust approach:
The question as written requires seven significant figures after the decimal point, not seven significant figures total. The length of the string will include both.
Single.ToString by default only prints 7 digits of precision, but stores up to 9 internally. The floats 20.000001 and 20.000003 will both convert to "20" by default, but are not equal within 1e-7. (This is a function of the string conversion, not precision limits with this datatype.)
Floats can be parsed and output in exponential notation. What happens if a user enters 1e+20 at the prompt, a perfectly-valid float with a length of 5?
Given two numbers f1 and f2, and an accepted error ε of 10^-7, it follows mathematically that f1 and f2 should be considered equal (within accepted tolerance) if and only if:
|f1 - f2| ≤ ε
Or in C#, we can write the equivalent function:
// preconditions: e is non-negativeps, f1 and f2 are not NaN, Inf, or -Inf.
bool AreApproximatelyEqual(float f1, float f2, float eps)
{
return Math.Abs(f1 - f2) <= eps;
}
Given floats firstNum and secondNum, approximate equality is obtained as:
bool equal = AreApproximatelyEqual(firstNum, secondNum, 1e-7f);

Get last (ie endswith) 3 digits of a decimal (.NET)

I may be using Math for evil... But, in a number written as 0.7000123
I need to get the "123" - That is, I need to extract the last 3 digits in the decimal portion of a number. The least significant digits, when the first few are what most people require.
Examples:
0.7500123 -> 123
0.5150111 -> 111
It always starts from digit 5. And yes, I'm storing secret information inside this number, in the part of the decimal that will not affect how the number is used - which is the potentially evil part. But it's still the best way around a certain problem I have.
I'm wondering whether math or string manipulation is the least dodgy way of doing this.
Performance is not an issue, at all, since I'm calling it once.
Can anyone see an easy mathematical way of doing this? eg A combination of Math functions (I've missed) in .NET?
It's a strange request to be sure. But one way to get an int value of the last 3 digits is like so:
int x = (int)((yourNumber * 10000000) % 1000);
I'm going to guess there's a better way to get the information you're looking for that's cleaner, but given what you've asked for, this should work.
First Convert Your number into the String.
string s = num.ToString();
string s1 = s.Substring(s.Length - 3, 3);
Now s1 Contains Last 3 Digits Of the Number
Using modulo will get you the last 3 digits:
var d = 0.7000123m;
d = d * 10000000 % 1000;
d will now hold the value 123.
Try this:
string value= "0.1234567";
string lastthreedigit= value.Substring(value.Length - 3);

Distinct number algorithm from string

I'm working on a simple game and I have the requirement of taking a word or phrase such as "hello world" and converting it to a series of numbers.
The criteria is:
Numbers need to be distinct
Need ability to configure maximum sequence of numbers. IE 10 numbers total.
Need ability to configure max range for each number in sequence.
Must be deterministic, that is we should get the same sequence everytime for the same input phrase.
I've tried breaking down the problem like so:
Convert characters to ASCII number code: "hello world" = 104 101 108 108 111 32 119 111 114 108 100
Remove everyother number until we satisfy total numbers (10 in this case)
Foreach number if number > max number then divide by 2 until number <= max number
If any numbers are duplicated increase or decrease the first occurence until satisfied. (This could cause a problem as you could create a duplicate by solving another duplicate)
Is there a better way of doing this or am I on the right track? As stated above I think I may run into issues with removing distinction.
If you want to limit the size of the output series - then this is impossible.
Proof:
Assume your output is a series of size k, each of range r <= M for some predefined M, then there are at most k*M possible outputs.
However, there are infinite number of inputs, and specifically there are k*M+1 different inputs.
From pigeonhole principle (where the inputs are the pigeons and the outputs are the pigeonholes) - there are 2 pigeons (inputs) in one pigeonhole (output) - so the requirement cannot be achieved.
Original answer, provides workaround without limiting the size of the output series:
You can use prime numbers, let p1,p2,... be the series of prime numbers.
Then, convert the string into series of numbers using number[i] = ascii(char[i]) * p_i
The range of each character is obviously then [0,255 * p_i]
Since for each i,j such that i != j -> p_i * x != p_j * y (for each x,y) - you get uniqueness. However, this is mainly nice theoretically as the generated numbers might grow quickly, and for practical implementation you are going to need some big number library such as java's BigInteger (cannot recall the C# equivalent)
Another possible solution (with the same relaxation of no series limitation) is:
number[i] = ascii(char[i]) + 256*(i-1)
In here the range for number[i] is [256*(i-1),256*i), and elements are still distinct.
Mathematically, it is theoretically possible to do what you want, but you won't be able to do it in C#:
If your outputs are required to be distinct, then you cannot lose any information after encoding the string using ASCII values. This means that if you limit your output size to n numbers then the numbers will have to include all information from the encoding.
So for your example
"Hello World" -> 104 101 108 108 111 32 119 111 114 108 100
you would have to preserve the meaning of each of those numbers. The simplest way to do this would just 0 pad your numbers to three digits and concatenate them together into one large number...making your result 104101108111032119111114108100 for max numbers = 1.
(You can see where the issue becomes, for arbitrary length input you need very large numbers.) So certainly it is possible to encode any arbitrary length string input to n numbers, but the numbers will become exceedingly large.
If by "numbers" you meant digits, then no you cannot have distinct outputs, as #amit explained in his example with the pidgeonhole principle.
Let's eliminate your criteria as easily as possible.
For distinct, deterministic, just use a hash code. (Hash actually isn't guaranteed to be distinct, but is highly likely to be):
string s = "hello world";
uint hash = Convert.ToUInt32(s.GetHashCode());
Note that I converted the signed int returned from GetHashCode to unsigned, to avoid the chance of having a '-' appear.
Then, for your max range per number, just convert the base.
That leaves you with the maximum sequence criteria. Without understanding your requirements better, all I can propose is truncate if necessary:
hash.toString().Substring(0, size)
Truncating leaves a chance that you'll no longer be distinct, but that must be built in as acceptable to your requirements? As amit explains in another answer, you can't have infinite input and non-infinite output.
Ok, so in one comment you've said that this is just to pick lottery numbers. In that case, you could do something like this:
public static List<int> GenNumbers(String input, int count, int maxNum)
{
List<int> ret = new List<int>();
Random r = new Random(input.GetHashCode());
for (int i = 0; i < count; ++i)
{
int next = r.Next(maxNum - i);
foreach (int picked in ret.OrderBy(x => x))
{
if (picked <= next)
++next;
else
break;
}
ret.Add(next);
}
return ret;
}
The idea is to seed a random number generator with the hash code of the String. The rest of that is just picking numbers without replacement. I'm sure it could be written more efficiently - an alternative is to generate all maxNum numbers and shuffle the first count. Warning, untested.
I know newer versions of the .Net runtime use a random String hash code algorithm (so results will differ between runs), but I believe this is opt-in. Writing your own hash algorithm is an option.

How do I find the average in a LARGE set of numbers?

I have a large set of numbers, probably in the multiple gigabytes range. First issue is that I can't store all of these in memory. Second is that any attempt at addition of these will result in an overflow. I was thinking of using more of a rolling average, but it needs to be accurate. Any ideas?
These are all floating point numbers.
This is not read from a database, it is a CSV file collected from multiple sources. It has to be accurate as it is stored as parts of a second (e.g; 0.293482888929) and a rolling average can be the difference between .2 and .3
It is a set of #'s representing how long users took to respond to certain form actions. For example when showing a messagebox, how long did it take them to press OK or Cancel. The data was sent to me stored as seconds.portions of a second; 1.2347 seconds for example. Converting it to milliseconds and I overflow int, long, etc.. rather quickly. Even if I don't convert it, I still overflow it rather quickly. I guess the one answer below is correct, that maybe I don't have to be 100% accurate, just look within a certain range inside of a sepcific StdDev and I would be close enough.
You can sample randomly from your set ("population") to get an average ("mean"). The accuracy will be determined by how much your samples vary (as determined by "standard deviation" or variance).
The advantage is that you have billions of observations, and you only have to sample a fraction of them to get a decent accuracy or the "confidence range" of your choice. If the conditions are right, this cuts down the amount of work you will be doing.
Here's a numerical library for C# that includes a random sequence generator. Just make a random sequence of numbers that reference indices in your array of elements (from 1 to x, the number of elements in your array). Dereference to get the values, and then calculate your mean and standard deviation.
If you want to test the distribution of your data, consider using the Chi-Squared Fit test or the K-S test, which you'll find in many spreadsheet and statistical packages (e.g., R). That will help confirm whether this approach is usable or not.
Integers or floats?
If they're integers, you need to accumulate a frequency distribution by reading the numbers and recording how many of each value you see. That can be averaged easily.
For floating point, this is a bit of a problem. Given the overall range of the floats, and the actual distribution, you have to work out a bin-size that preserves the accuracy you want without preserving all of the numbers.
Edit
First, you need to sample your data to get a mean and a standard deviation. A few thousand points should be good enough.
Then, you need to determine a respectable range. Folks pick things like ±6σ (standard deviations) around the mean. You'll divide this range into as many buckets as you can stand.
In effect, the number of buckets determines the number of significant digits in your average. So, pick 10,000 or 100,000 buckets to get 4 or 5 digits of precision. Since it's a measurement, odds are good that your measurements only have two or three digits.
Edit
What you'll discover is that the mean of your initial sample is very close to the mean of any other sample. And any sample mean is close to the population mean. You'll note that most (but not all) of your means are with 1 standard deviation of each other.
You should find that your measurement errors and inaccuracies are larger than your standard deviation.
This means that a sample mean is as useful as a population mean.
Wouldn't a rolling average be as accurate as anything else (discounting rounding errors, I mean)? It might be kind of slow because of all the dividing.
You could group batches of numbers and average them recursively. Like average 100 numbers 100 times, then average the result. This would be less thrashing and mostly addition.
In fact, if you added 256 or 512 at once you might be able to bit-shift the result by either 8 or 9, (I believe you could do this in a double by simply changing the floating point mantissa)--this would make your program extremely quick and it could be written recursively in just a few lines of code (not counting the unsafe operation of the mantissa shift).
Perhaps dividing by 256 would already use this optimization? I may have to speed test dividing by 255 vs 256 and see if there is some massive improvement. I'm guessing not.
You mean of 32-bit and 64-bit numbers. But why not just use a proper Rational Big Num library? If you have so much data and you want an exact mean, then just code it.
class RationalBignum {
public Bignum Numerator { get; set; }
public Bignum Denominator { get; set; }
}
class BigMeanr {
public static int Main(string[] argv) {
var sum = new RationalBignum(0);
var n = new Bignum(0);
using (var s = new FileStream(argv[0])) {
using (var r = new BinaryReader(s)) {
try {
while (true) {
var flt = r.ReadSingle();
rat = new RationalBignum(flt);
sum += rat;
n++;
}
}
catch (EndOfStreamException) {
break;
}
}
}
Console.WriteLine("The mean is: {0}", sum / n);
}
}
Just remember, there are more numeric types out there than the ones your compiler offers you.
You could break the data into sets of, say, 1000 numbers, average these, and then average the averages.
This is a classic divide-and-conquer type problem.
The issue is that the average of a large set of numbers is the same
as the average of the first-half of the set, averaged with the average of the second-half of the set.
In other words:
AVG(A[1..N]) == AVG( AVG(A[1..N/2]), AVG(A[N/2..N]) )
Here is a simple, C#, recursive solution.
Its passed my tests, and should be completely correct.
public struct SubAverage
{
public float Average;
public int Count;
};
static SubAverage AverageMegaList(List<float> aList)
{
if (aList.Count <= 500) // Brute-force average 500 numbers or less.
{
SubAverage avg;
avg.Average = 0;
avg.Count = aList.Count;
foreach(float f in aList)
{
avg.Average += f;
}
avg.Average /= avg.Count;
return avg;
}
// For more than 500 numbers, break the list into two sub-lists.
SubAverage subAvg_A = AverageMegaList(aList.GetRange(0, aList.Count/2));
SubAverage subAvg_B = AverageMegaList(aList.GetRange(aList.Count/2, aList.Count-aList.Count/2));
SubAverage finalAnswer;
finalAnswer.Average = subAvg_A.Average * subAvg_A.Count/aList.Count +
subAvg_B.Average * subAvg_B.Count/aList.Count;
finalAnswer.Count = aList.Count;
Console.WriteLine("The average of {0} numbers is {1}",
finalAnswer.Count, finalAnswer.Average);
return finalAnswer;
}
The trick is that you're worried about an overflow. In that case, it all comes down to order of execution. The basic formula is like this:
Given:
A = current avg
C = count of items
V = next value in the sequence
The next average (A1) is:
(C * A) + V
A1 = ———————————
C + 1
The danger is over the course of evaulating the sequence, while A should stay relatively manageable C will become very large.
Eventually C * A will overflow the integer or double types.
One thing we can try is to re-write it like this, to reduce the chance of an overflow:
A1 = C/(C+1) * A/(C+1) + V/(C+1)
In this way, we never multiply C * A and only deal with smaller numbers. But the concern now is the result of the division operations. If C is very large, C/C+1 (for example) may not be meaningful when constrained to normal floating point representations. The best I can suggest is to use the largest type possible for C here.
Here's one way to do it in pseudocode:
average=first
count=1
while more:
count+=1
diff=next-average
average+=diff/count
return average
Sorry for the late comment, but isn't it the formula above provided by Joel Coehoorn rewritten wrongly?
I mean, the basic formula is right:
Given:
A = current avg
C = count of items
V = next value in the sequence
The next average (A1) is:
A1 = ( (C * A) + V ) / ( C + 1 )
But instead of:
A1 = C/(C+1) * A/(C+1) + V/(C+1)
shouldn't we have:
A1 = C/(C+1) * A + V/(C+1)
That would explain kastermester's post:
"My math ticks off here - You have C, which you say "go towards infinity" or at least, a really big number, then: C/(C+1) goes towards 1. A /(C+1) goes towards 0. V/(C+1) goes towards 0. All in all: A1 = 1 * 0 + 0 So put shortly A1 goes towards 0 - seems a bit off. – kastermester"
Because we would have A1 = 1 * A + 0, i.e., A1 goes towards A, which it's right.
I've been using such method for calculating averages for a long time and the aforementioned precision problems have never been an issue for me.
With floating point numbers the problem is not overflow, but loss of precision when the accumulated value gets large. Adding a small number to a huge accumulated value will result in losing most of the bits of the small number.
There is a clever solution by the author of the IEEE floating point standard himself, the Kahan summation algorithm, which deals exactly with this kind of problems by checking the error at each step and keeping a running compensation term that prevents losing the small values.
If the numbers are int's, accumulate the total in a long. If the numbers are long's ... what language are you using? In Java you could accumulate the total in a BigInteger, which is an integer which will grow as large as it needs to be. You could always write your own class to reproduce this functionality. The gist of it is just to make an array of integers to hold each "big number". When you add two numbers, loop through starting with the low-order value. If the result of the addition sets the high order bit, clear this bit and carry the one to the next column.
Another option would be to find the average of, say, 1000 numbers at a time. Hold these intermediate results, then when you're done average them all together.
Why is a sum of floating point numbers overflowing? In order for that to happen, you would need to have values near the max float value, which sounds odd.
If you were dealing with integers I'd suggest using a BigInteger, or breaking the set into multiple subsets, recursively averaging the subsets, then averaging the averages.
If you're dealing with floats, it gets a bit weird. A rolling average could become very inaccurate. I suggest using a rolling average which is only updated when you hit an overflow exception or the end of the set. So effectively dividing the set into non-overflowing sets.
Two ideas from me:
If the numbers are ints, use an arbitrary precision library like IntX - this could be too slow, though
If the numbers are floats and you know the total amount, you can divide each entry by that number and add up the result. If you use double, the precision should be sufficient.
Why not just scale the numbers (down) before computing the average?
If I were to find the mean of billions of doubles as accurately as possible, I would take the following approach (NOT TESTED):
Find out 'M', an upper bound for log2(nb_of_input_data). If there are billions of data, 50 may be a good candidate (> 1 000 000 billions capacity). Create an L1 array of M double elements. If you're not sure about M, creating an extensible list will solve the issue, but it is slower.
Also create an associated L2 boolean array (all cells set to false by default).
For each incoming data D:
int i = 0;
double localMean = D;
while (L2[i]) {
L2[i] = false;
localMean = (localMean + L1[i]) / 2;
i++;
}
L1[i] = localMean;
L2[i] = true;
And your final mean will be:
double sum = 0;
double totalWeight = 0;
for (int i = 0; i < 50) {
if (L2[i]) {
long weight = 1 << i;
sum += L1[i] * weight;
totalWeight += weight;
}
}
return sum / totalWeight;
Notes:
Many proposed solutions in this thread miss the point of lost precision.
Using binary instead of 100-group-or-whatever provides better precision, and doubles can be safely doubled or halved without losing precision!
Try this
Iterate through the numbers incrementing a counter, and adding each number to a total, until adding the next number would result in an overflow, or you run out of numbers.
( It makes no difference if the inputs are integers or floats - use the largest precision float you can and convert each input to that type)
Divide the total by the counter to get a mean ( a floating point), and add it to a temp array
If you had run out of numbers, and there is only one element in temp, that's your result.
Start over using the temp array as input, ie iteratively recurse until you reached the end condition described earlier.
depending on the range of numbers it might be a good idea to have an array where the subscript is your number and the value is the quantity of that number, you could then do your calculation from this

Categories

Resources