Even experienced programmers write C# code like this sometimes:
double x = 2.5;
double y = 3;
if (x + 0.5 == 3) {
// this will never be executed
}
Basically, it's common knowledge that two doubles (or floats) can never be precisely equal to each other, because of the way the computer handles floating point arithmetic.
The problem is, everyone sort-of knows this, but code like this is still all over the place. It's just so easy to overlook.
Questions for you:
How have you dealt with this in your development organization?
Is this such a common thing that the compiler should be checking that we all should be screaming really loud for VS2010 to include a compile-time warning if someone is comparing two doubles/floats?
UPDATE: Folks, thanks for the comments. I want to clarify that I most certainly understand that the code above is incorrect. Yes, you never want to == compare doubles and floats. Instead, you should use epsilon-based comparison. That's obvious. The real question here is "how do you pinpoint the problem", not "how do you solve the technical issue".
Floating point values certainly can be equal to each other, and in the case you've given they always will be equal. You should almost never compare for equality using equals, but you do need to understand why - and why the example you've shown isn't appropriate.
I don't think it's something the compiler should necessarily warn about, but you may want to see whether it's something FxCop can pick up on. I can't see it in the warning list, but it may be there somewhere...
Personally I'm reasonably confident that competent developers would be able to spot this in code review, but that does rely on you having a code review in place to start with. It also relies on your developers knowing when to use double and when to use decimal, which is something I've found often isn't the case...
static int _yes = 0;
static int _no = 0;
static void Main(string[] args)
{
for (int i = 0; i < 1000000; i++)
{
double x = 1;
double y = 2;
if (y - 1 == x)
{
_yes++;
}
else
{
_no++;
}
}
Console.WriteLine("Yes: " + _yes);
Console.WriteLine("No: " + _no);
Console.Read();
}
Output
Yes: 1000000
No: 0
In our organization we have a lot of financial calculations and we don't use float and double for such tasks. We use Decimal in .NET, BigDecimal in Java and Numeric in MSSQL to escape round-off errors.
This article describes the problem: What Every CS Should Know About floating-Point Arithmetic
If FxCop or similar (as Jon suggests) doesn't work out for you a more heavy handed approach might be to take a copy of the code - replace all instances of float or double with a class you've written that's somewhat similar to System.Double, except that you overload the == operator to generate a warning!
I don't know if this is feasible in practice as I've not tried it - but let us know if you do try :-)
Mono's Gendarme is an FxCop-like tool. It has a rule called AvoidFloatingPointEqualityRule under the Correctness category. You could try it to find instances of this error in your code. I haven't used it, but it should analyse regular .net dll's. The FxCop rule with the same name was removed long ago.
Related
I need to operate with variables that can be either positive or negative values, in a way that a method should be able to add or substract its absolute value, but without changing its sign.
I have the following code to do something alike:
public static class FloatExtensions
{
public static float sumToItsAbsolute(this float currentValue, float summedValue)
{
float currentAbsoluteValue = Math.Abs(currentValue);
return (currentValue / currentAbsoluteValue) * (currentAbsoluteValue + summedValue);
}
}
I can work with it, but I would like to know if there is a better approach.
EDIT: By "better", I mean more performant. And if there is no potential bugs I have not seen yet.
Thanks in advance.
If you spend a little time looking around the documentation of the Math class (which you are already using, so it is known to you already), you will notice that it offers a many handy functions. Among them a function to get the sign of a value.
Thus, instead of doing v / Math.Abs(v) to get the sign of v as values -1 or 1 (functional and creative, but cumbersome, and perhaps even prone to precision errors IEEE floating point operations might suffer from), you could just do Math.Sign(v) to get the sign.
With regards to bugs: I'd like to throw the ball back into your court and like to encourage you to think a little bit about the behavior of your code for yourself first. Note how your code is doing a division and a multiplication of a sum. Can you find values for currentValue where either the division or the multiplication might fail or yield the wrong result?
Performancewise, i really doubt this function is what would make your program slower than you desire. If your program is behaving slower than expected in some aspects, use a profiler to figure out the actual part(s) in your program that account for most of the performance costs instead of eyeballing and blind-guessing it.
I am fairly new to c# and trying to learn best practices. I've been faced with many situations over the last week in which I need to make a choice between longer+simpler code, or shorter code that combines several actions into a single statement. What are some standards you veteran coders use when trying to write clear, concise code? Here is an example of code I'm writing with two options. Which is preferable?
A)
if (ncPointType.StartsWith("A"))//analog points
{
string[] precisionString = Regex.Split(unitsParam.Last(), ", ");
precision = int.Parse(precisionString[1]);
}
else
precision = null;
B)
if (ncPointType.StartsWith("A"))//analog points
precision = int.Parse(Regex.Split(unitsParam.Last(), ", ")[1]);
else
precision = null;
There is no right or wrong. This is opinion based really.
However, remember that whether you add braces, add comments, add whitespace or whatever, it doesn't affect the performance or the size of the final assembly because the compiler optimizes it very well. So, why not go with the more verbose so that other programmers can follow it easier?
This is basically subject to the coding standard you choose. There is no right or wrong, just personal preference.
Agree as a team what you prefer.
But even better is:
precision = ncPointType.StartsWith("A")?
int.Parse(Regex.Split(unitsParam.Last(), ", ")):
null;
This expresses the function beiong executed (set the precision),
and the conditions that control how it is set, in even less code,
- without creating unnecessary temporary variables to hold a temporary result that is not used anywhere else.
If you ask me both A and B are bad practice. You should always use braces regardless of whether or not it is one line or multiple lines. This helps to prevent bugs in the future when people add additional lines of code to your if or else blocks and don't notice that the braces are missing.
Sometimes having "more" code, like using temporary variables, etc., will make your code easier to debug as you can hover over symbols. It's all about balance, and this comes with experience. Remember that you may not be the only one working on your code, so clarity above all.
Ultimately it comes down to your choice because it has nothing to do with performance, but I would also note that you cannot instantiate or define variables if you omit the curly braces:
int a = 0;
if (a == 0)
int b = 3;
Is not valid, whereas:
int a = 0;
if (a == 0)
{
int b = 3;
}
Is valid.
It's easier to debug option A, but option B is more concise while still being readable. You could make it even more concise with a ternary operator:
precision = ncPointType.StartsWith("A") ? int.Parse(Regex.Split(unitsParam.Last(), ", ")[1]) : null;
Although it's far more readable this way (and still works the same way!):
precision = ncPointType.StartsWith("A") ?
int.Parse(Regex.Split(unitsParam.Last(), ", ")[1]) :
null;
It's best to stick with the standards used in your project. Readability is far more important for maintainability than having less lines of code. In both options, the efficiency is the same, so you don't need to worry about speed in this case.
As everyone else here points out, there are no hard and fast "rules" per se, but there should probably be braces around both block of code; both A and B have characteristics similar to the Apple goto bug because the introduce potential ambiguity.
Consider:
void Main()
{
int x=1;
if (x==1)
Console.WriteLine("1");
else if (x==2)
Console.WriteLine("NOT 1");
Console.WriteLine("OK");
}
will produce
1
OK
To someone scanning the code, especially if it's surrounded by other, innocuous looking code, this could easily be misread as "only print "OK" if x==2). Obviously some people will spot it, some won't, but why introduce the danger for the want of a brace of braces :)
I'm working on a n image processing library which extends OpenCV, HALCON, ... . The library must be with .NET Framework 3.5 and since my experiences with .NET are limited I would like to ask some questions regarding the performance.
I have encountered a few specific things which I cannot explain to myself properly and would like you to ask a) why and b) what is the best practise to deal with the cases.
My first question is about Math.pow. I already found some answers here on StackOverflow which explains it quite well (a) but not what to do about this(b). My benchmark Program looks like this
Stopwatch watch = new Stopwatch(); // from the Diagnostics class
watch.Start();
for (int i = 0; i < 1000000; i++)
double result = Math.Pow(4,7) // the function call
watch.Stop()
The result was not very nice (~300ms on my computer) (I have run the test 10 times and calcuated the average value).
My first idea was to check wether this is because it is a static function. So I implemented my own class
class MyMath
{
public static double Pow (double x, double y) //Using some expensive functions to calculate the power
{
return Math.Exp(Math.Log(x) * y);
}
public static double PowLoop (double x, int y) // Using Loop
{
double res = x;
for(int i = 1; i < y; i++)
res *= x;
return res;
}
public static double Pow7 (double x) // Using inline calls
{
return x * x * x * x * x * x * x;
}
}
THe third thing I checked were if I would replace the Math.Pow(4,7) directly through 4*4*4*4*4*4*4.
The results are (the average out of 10 test runs)
300 ms Math.Pow(4,7)
356 ms MyMath.Pow(4,7) //gives wrong rounded results
264 ms MyMath.PowLoop(4,7)
92 ms MyMath.Pow7(4)
16 ms 4*4*4*4*4*4*4
Now my situation now is basically like this: Don't use Math for Pow. My only problem is just that... do I really have to implement my own Math-class now? It seems somehow ineffective to implement an own class just for the power function. (Btw. PowLoop and Pow7 are even faster in the Release build by ~25% while Math.Pow is not).
So my final questions are
a) am I wrong if I wouldn't use Math.Pow at all (but for fractions maybe) (which makes me somehow sad).
b) if you have code to optimize, are you really writing all such mathematical operations directly?
c) is there maybe already a faster (open-source^^) library for mathematical operations
d) the source of my question is basically: I have assumed that the .NET Framework itself already provides very optimized code / compile results for such basic operations - be it the Math-Class or handling arrays and I was a little surprised how much benefit I would gain by writing my own code. Are there some other, general "fields" or something else to look out in C# where I cannot trust C# directly.
Two things to bear in mind:
You probably don't need to optimise this bit of code. You've just done a million calls to the function in less than a second. Is this really going to cause big problems in your program?
Math.Pow is probably fairly optimal anyway. At a guess, it will be calling a proper numerics library written in a lower level language, which means you shouldn't expect orders of magnitude increases.
Numerical programming is harder than you think. Even the algorithms that you think you know how to calculate, aren't calculated that way. For example, when you calculate the mean, you shouldn't just add up the numbers and divide by how many numbers you have. (Modern numerics libraries use a two pass routine to correct for floating point errors.)
That said, if you decide that you definitely do need to optimise, then consider using integers rather than floating point values, or outsourcing this to another numerics library.
Firstly, integer operations are much faster than floating point. If you don't need floating point values, don't use the floating point data type. This generally true for any programming language.
Secondly, as you have stated yourself, Math.Pow can handle reals. It makes use of a much more intricate algorithm than a simple loop. No wonder it is slower than simply looping. If you get rid of the loop and just do n multiplications, you are also cutting off the overhead of setting up the loop - thus making it faster. But if you don't use a loop, you have to know
the value of the exponent beforehand - it can't be supplied at runtime.
I am not really sure why Math.Exp and Math.Log is faster. But if you use Math.Log, you can't find the power of negative values.
Basically int are faster and avoiding loops avoid extra overhead. But you are trading off some flexibility when you go for those. But it is generally a good idea to avoid reals when all you need are integers, but in this case coding up a custom function when one already exists seems a little too much.
The question you have to ask yourself is whether this is worth it. Is Math.Pow actually slowing your program down? And in any case, the Math.Pow already bundled with your language is often the fastest or very close to that. If you really wanted to make an alternate implementation that is really general purpose (i.e. not limited to only integers, positive values, etc.), you will probably end up using the same algorithm used in the default implementation anyway.
When you are talking about making a million iterations of a line of code then obviously every little detail will make a difference.
Math.Pow() is a function call which will be substantially slower than your manual 4*4...*4 example.
Don't write your own class as its doubtful you'll be able to write anything more optimised than the standard Math class.
I'm a beginner C# programmer, and to improve my skills I decided to give Project Euler a try. The first problem on the site asks you to find the sum of all the multiples of 3 and 5 under 1000. Since I'm essentially doing the same thing twice, I made a method to multiply a base number incrementally, and add the sum of all the answers togethor.
public static int SumOfMultiplication(int Base, int limit)
{
bool Escape = false;
for (int mult = 1; Escape == true; mult++)
{
int Number = 0;
int iSum = 0;
Number = Base * mult;
if (Number > limit)
return iSum;
else
iSum = iSum + Number;
}
regardless of what I put in for both parameters, it ALWAYS returns zero. I'm 99% sure it has something to do with the scope of the variables, but I have no clue how to fix it. All help is appreciated.
Thanks in advance,
Sam
Your loop never actually executes:
bool Escape = false;
for (int mult = 1; Escape == true; mult++)
Escape is set to false initially, so the first test fails (Escape == true returns false) and the body of the loop is skipped.
The compiler would have told you if you were trying to access variables outside of their defined scope, so that's not the problem. You are also missing a return statement, but that is probably a typo.
I would also note that your code never checks if the number to be added to the sum is actually a multiple of 3 or 5. There are other issues as well (for example, iSum is declared inside of the loop and initialized to 0 after each iteration), but I'll let you work that one out since this is practice. The debugger is your friend in cases like these :)
EDIT: If you need help with the actual logic I'll be happy to help, but I figure you want to work it out on your own if possible.
As others have pointed out, the problem is that the control flow does not do what you think it does. This is a common beginner problem.
My suggestion to you is learn how to use your debugger. Beginners often have this strange idea that they're not allowed to use tools to solve their coding problems; that rather, they have to reason out the defect in the program by simply reading it. Once the programs become more than a page long, that becomes impossible for humans. The debugger is your best friend, so get to know its features really well.
In this case if you'd stepped through the code in the debugger you'd see that the loop condition was being evaluated and then the loop was being skipped. At that point you wouldn't be asking "why does this return zero?", you'd be asking "why is the loop body always skipped?" Clearly that is a much more productive question to ask since that is actually the problem here.
Don't write any code without stepping through it in the debugger. Watch every variable, watch how it changes value (the debugger highlights variables in the watch windows right after they change value, by the way) and make sure that the control flow and the variable changes are exactly as you'd expect. Pay attention to quiet doubts; if anything seems out of the ordinary, track it down, and either learn why it is correct, or fix it until it is.
Regarding the actual problem: remember that 15, 30, 45, 60... are all multiples of both three and five, but you only want to add them to the sum once. My advice when solving Project Euler problems is to write code that is as like what you are trying to solve as is possible. Try writing the problem out in "pseudocode" first. I'd pseudocode this as:
sum = 0
for each positive number under 1000:
if number is multiple of three or five then:
add number to sum
Once you have that pseudocode you can notice its subtleties. Like, is 1000 included? Does the problem say "under 1000" or "up to 1000"? Make sure your loop condition considers that. And so on.
The closer the program reads like the problem actually being solved, the more likely it is to be correct.
It does not enter for loop because for condition is false.
Escape == true
returns false
Advice:
Using for loop is much simpler if you use condition as limit for breaking loop
for (int mult = 1; something < limit; mult++)
This way in most cases you do not need to check condition in loop
Most programming languages have have operator modulo division.
http://en.wikipedia.org/wiki/Modulo_operation
It might come handy whit this problem.
There are several problems with this code. The first, and most important, is that you are using the Escape variable only once. It is never set to false within your for loop, so it serves no purpose whatsoever. It should be removed. Second, isum is declared within your for loop, which means it will keep being re-initialized to 0 every time the loop executes. This means you will only get the last multiple, not the addition of all multiples. Here is a corrected code sample:
int iSum = 0;
for(int mult = 1; true; mult++)
{
int Number = Base * mult;
if(Number > limit)
return iSum;
else
iSum += Number;
}
I'm trying to work through the problems on projecteuler.net but I keep running into a couple of problems.
The first is a question of storing large quanities of elements in a List<t>. I keep getting OutOfMemoryException's when storing large quantities in the list.
Now I admit I might not be doing these things in the best way but, is there some way of defining how much memory the app can consume?
It usually crashes when I get abour 100,000,000 elements :S
Secondly, some of the questions require the addition of massive numbers. I use ulong data type where I think the number is going to get super big, but I still manage to wrap past the largest supported int and get into negative numbers.
Do you have any tips for working with incredibly large numbers?
Consider System.Numerics.BigInteger.
You need to use a large number class that uses some basic math principals to split these operations up. This implementation of a C# BigInteger library on CodePoject seems to be the most promising. The article has some good explanations of how operations with massive numbers work, as well.
Also see:
Big integers in C#
As far as Project Euler goes, you might be barking up the wrong tree if you are hitting OutOfMemory exceptions. From their website:
Each problem has been designed according to a "one-minute rule", which means that although it may take several hours to design a successful algorithm with more difficult problems, an efficient implementation will allow a solution to be obtained on a modestly powered computer in less than one minute.
As user Jakers said, if you're using Big Numbers, probably you're doing it wrong.
Of the ProjectEuler problems I've done, none have required big-number math so far.
Its more about finding the proper algorithm to avoid big-numbers.
Want hints? Post here, and we might have an interesting Euler-thread started.
I assume this is C#? F# has built in ways of handling both these problems (BigInt type and lazy sequences).
You can use both F# techniques from C#, if you like. The BigInt type is reasonably usable from other languages if you add a reference to the core F# assembly.
Lazy sequences are basically just syntax friendly enumerators. Putting 100,000,000 elements in a list isn't a great plan, so you should rethink your solutions to get around that. If you don't need to keep information around, throw it away! If it's cheaper to recompute it than store it, throw it away!
See the answers in this thread. You probably need to use one of the third-party big integer libraries/classes available or wait for C# 4.0 which will include a native BigInteger datatype.
As far as defining how much memory an app will use, you can check the available memory before performing an operation by using the MemoryFailPoint class.
This allows you to preallocate memory before doing the operation, so you can check if an operation will fail before running it.
string Add(string s1, string s2)
{
bool carry = false;
string result = string.Empty;
if (s1.Length < s2.Length)
s1 = s1.PadLeft(s2.Length, '0');
if(s2.Length < s1.Length)
s2 = s2.PadLeft(s1.Length, '0');
for(int i = s1.Length-1; i >= 0; i--)
{
var augend = Convert.ToInt64(s1.Substring(i,1));
var addend = Convert.ToInt64(s2.Substring(i,1));
var sum = augend + addend;
sum += (carry ? 1 : 0);
carry = false;
if(sum > 9)
{
carry = true;
sum -= 10;
}
result = sum.ToString() + result;
}
if(carry)
{
result = "1" + result;
}
return result;
}
I am not sure if it is a good way of handling it, but I use the following in my project.
I have a "double theRelevantNumber" variable and an "int PowerOfTen" for each item and in my relevant class I have a "int relevantDecimals" variable.
So... when large numbers is encountered they are handled like this:
First they are changed to x,yyy form. So if the number 123456,789 was inputed and the "powerOfTen" was 10, it would start like this:
theRelevantNumber = 123456,789
PowerOfTen = 10
The number was then: 123456,789*10^10
It is then changed to:
1,23456789*10^15
It is then rounded by the number of relevant decimals (for example 5) to 1,23456 and then saved along with "PowerOfTen = 15"
When adding or subracting numbers together, any number outside the relevant decimals are ignored. Meaning if you take:
1*10^15 + 1*10^10 it will change to 1,00001 if "relevantDecimals" is 5 but will not change at all if "relevantDecimals" are 4.
This method make you able to deal with numbers up doubleLimit*10^intLimit without any problem, and at least for OOP it is not that hard to keep track of.
You don't need to use BigInteger. You can do this even with string array of numbers.
class Solution
{
static void Main(String[] args)
{
int n = 5;
string[] unsorted = new string[6] { "3141592653589793238","1", "3", "5737362592653589793238", "3", "5" };
string[] result = SortStrings(n, unsorted);
foreach (string s in result)
Console.WriteLine(s);
Console.ReadLine();
}
static string[] SortStrings(int size, string[] arr)
{
Array.Sort(arr, (left, right) =>
{
if (left.Length != right.Length)
return left.Length - right.Length;
return left.CompareTo(right);
});
return arr;
}
}
If you want to work with incredibly large numbers look here...
MIKI Calculator
I am not a professional programmer i write for myself, sometimes, so sorry for unprofessional use of c# but the program works. I will be grateful for any advice and correction.
I use this calculator to generate 32-character passwords from numbers that are around 58 digits long.
Since the program adds numbers in the string format, you can perform calculations on numbers with the maximum length of the string variable. The program uses long lists for the calculation, so it is possible to calculate on larger numbers, possibly 18x the maximum capacity of the list.