I'm currently reading through C# 4.0 In A Nutshell, and in chapter 2 it states that if an int object has a value of int.MinValue (-2147483648) and the value is decreased, as the value cannot go any lower, it goes to the highest value, int.MaxValue (2147483647):
int a = int.MinValue;
Console.WriteLine(a); //-2147483648
a = a-1;
Console.WriteLine(a); //-2147483648
Console.WriteLine (a == int.MaxValue); // True
The book then goes on to mention how you can protect from this happening using checked.
However, I'm wondering why this is possible? Surely this could cause a lot of issues if its not protected with checked?
Is it not a bit of a flaw in the design? Why would anyone want a minimum value to become a maximum value?
Furthermore, I've noticed VB.NET will throw an 'Arithmetic overflow' instead of letting this happen:
Dim a As Integer
a = Integer.MaxValue
a = a + 1
Response.Write(a) 'Arithmetic overflow exception here
Sometimes it is desired behaviour. Other times, it simply isn't necessary to deal with and check for overflows, as they slow down program execution. Take a game drawing unbound (FPS). At 1000FPS, a small 1ms extra check could have a significant affect on framerate. Other times, it may be expected execution (I've only seen one use of this to date) - it flows on from what happens in C#'s ancestor - C++ and C.
Also, please note that the default behaviour (checked or unchecked) varies between configurations and environments:
If neither checked nor unchecked is used, a constant expression uses
the default overflow checking at compile time, which is checked.
Otherwise, if the expression is non-constant, the run-time overflow
checking depends on other factors such as compiler options and
environment configuration.
Since the code you posted is non-constant (and also due to various environment variables), no checking for overflows is performed.
It is a matter of language design - why this happens has to do with the binary representation of numbers. In .NET this is done with two's complement, hence the overflow issue.
The designers of VB.NET have decided to not allow such overflow/underflow issues.
The designers of C# have decided to leave control to the programmer.
It is possible that your programm automatically check overflow: right click on project file,
select Build, than select Advanced button in right bottom corner. In the opened window check Check Arithmetical Overflow checkbutton.
Related
I was tinkering with IP packet 'parsers' when I noticed something odd.
When it came to parsing the IP addresses, in C#
private uint srcAddress;
// stuff
srcAddress = (uint)(binaryReader.ReadInt32());
does the trick, so you'd think this VB.Net equivallent
Private srcAddress As UInteger
'' stuff
srcAddress = CUInt(binaryReader.ReadInt32())
would do the trick too. It doesn't. This :
srcAddress = reader.ReadUInt32()
however will.
Took some time to discover, but what have I dicovered -- if anything ? Why is this ?
VB.NET, by default, does something that C# doesn't do by default. It always check for numerical overflow. And that will trigger in your code, IP addresses whose last bit is 1 instead of 0 will produce a negative number and that cannot be converted to UInteger. A data type that can only store positive 32-bit numbers.
C# has this option too, you'd have to explicitly use the checked keyword in your code. Or use the same option that VB.NET projects have turned on by default: Project + Properties, Build tab, Advanced, tick the "Check for arithmetic overflow/underflow" checkbox. The same option in VB.NET project is named "Remove integer overflow checks", off by default.
Do note how these defaults affected the syntax of the languages as well. In C# you have to write a cast to convert a value to an incompatible value type. Not necessary in VB.NET, the runtime check keeps you out of trouble. It is very bad kind of trouble to have, overflow can produce drastically bad results. Not in your case, that happens, an IP address really is an unsigned number.
Do keep the other quirk about IP-addresses in mind, sockets were first invented on Unix machines that were powered by LSD and big-endian processors. You must generally use IPAddress.NetworkToHostOrder() to get the address in the proper order. Which only has overloads that take a signed integer type as the argument. So using ReadInt32() is actually correct, assuming it is an IPv4 address, you pass that directly to NetworkToHostOrder(). No fear of overflow.
I have made a calculator with C# (in VS 2008), but I can't understand why the
checked{iCurrent = (iCurrent * 10) + i;}
can check the overflow, can someone explain this? Thanks.
This is my code:
try
{
//get the typed
long iCurrent=long.Parse(textOut.Text);
if(bNumBegins)
{
iCurrent = i;
bNumBegins = false;
}
else
{
//check whether overflow
checked{iCurrent = (iCurrent * 10) + i;}
}
textOut.Text = iCurrent.ToString();
}
When you use the checked keyword you ask the compiler to automatically generate code that will test whether overflow has occurred after arithmetic operations. If overflow is detected, an OverflowException is thrown.
The default in C# is to not check for such overflow when performing arithmetic operations. The default can be changed to check all operations. Regardless of the default, the checked and unchecked keywords can be used to selectively check or ignore overflow as needed.
#DavidPilkington Yes, why need * 10? Thanks
I think that you are confused here.
The code is multiplying the iCurrent variable by 10 and then adding one. This is the desired effect that the coder wanted. The checked keyword is used to make sure that there is no OverflowException.
The *10 is not needed for the checked, the check is "needed" for the operation to ensure that the number is not too large.
Here is the MSDN documention on checked. I suggest you read through it and the examples to gain a better understanding.
I am working on a calculation module using C#, and I bumped on this :
double v = 4 / 100;
I know this is a wrong initialization that returns v = 0.0 instead of v = 0.04
The c# rules says I must ensure at least one of the member is a double, like this :
double v = (double) 4 / 100;
double v = 4.0 / 100;
However, I have many many initializations of that kind that involves integer variables operations, and I feel lazy to browse my code line by line to detect such mistakes.
Instead, is it possible to get warned by the compiler about this ?
Alright, after some playing around and what not, I have a solution. I used this article to come to this solution.I use StyleCop, so you'll need to get and install that. Then, you can download my C# project MathematicsAnalyzer.
First off, I did not account for all type conversion mismatches. In fact, I only accommodate one part.
Basically, I check to see if the line contains "double" followed by a space. I do know that could lead to false warnings, because the end of a class could be double or any number of other things, but I'll leave that to you to figure out how to properly isolate the type.
If a match is found, I check to see that it matches this regex:
double[ ][A-Za-z0-9]*[ ]?=(([ ]?[0-9]*d[ ]?/[ ]?[0-9]*;)|[ ]?[0-9]*[ ]?/[ ]?[0-9]*d;)
If it does -not- match this regex, then I add a violation. What this regex will match on is any of the following:
double i=4d / 100;
double i = 4d / 100;
double i = 4 / 100d;
double i = 4/ 100d;
double i = 4 /100d;
double i = 4/100d;
double i=4d / 100;
double i=4 / 100d;
double i=4/100d;
Any of the above will not create a violation. As it is currently written, pretty much if a 'd' isn't used, it'll throw a violation. You'll need to add extra logic to account for the other possible ways of explicitly casting an operand. As I'm writing this, I've just realized that having a 'd' on both operands will most likely throw an exception. Whoops.
And lastly, I could not get StyleCop to display my violation properly. It kept giving me an error about the rule not existing, and even with a second pair of eyes on it, we could not find a solution, so I hacked it. The error shows the name of the rule you were trying to find, so I just put the name of the rule as something descriptive and included the line number in it.
To install the custom rule, build the MathematicalAnalyzer project. Close Visual Studio and copy the DLL into the StyleCop install directory. When you open Visual Studio, you should see the rule in the StyleCop settings. Step 5 and 6 of the article I used shows where to do that.
This only gets one violation at a time throughout the solution, so you'll have to fix the violation it shows, and run StyleCop again to find the next one. There may be a way around that, but I ran out of juice and stopped here.
Enjoy!
This article explains how to set up custom Code Analysis rules that, when you run Code Analysis, can show warnings and what not.
http://blog.tatham.oddie.com.au/2010/01/06/custom-code-analysis-rules-in-vs2010-and-how-to-make-them-run-in-fxcop-and-vs2008-too/
Yes, I am using a profiler (ANTS). But at the micro-level it cannot tell you how to fix your problem. And I'm at a microoptimization stage right now. For example, I was profiling this:
for (int x = 0; x < Width; x++)
{
for (int y = 0; y < Height; y++)
{
packedCells.Add(Data[x, y].HasCar);
packedCells.Add(Data[x, y].RoadState);
packedCells.Add(Data[x, y].Population);
}
}
ANTS showed that the y-loop-line was taking a lot of time. I thought it was because it has to constantly call the Height getter. So I created a local int height = Height; before the loops, and made the inner loop check for y < height. That actually made the performance worse! ANTS now told me the x-loop-line was a problem. Huh? That's supposed to be insignificant, it's the outer loop!
Eventually I had a revelation - maybe using a property for the outer-loop-bound and a local for the inner-loop-bound made CLR jump often between a "locals" cache and a "this-pointer" cache (I'm used to thinking in terms of CPU cache). So I made a local for Width as well, and that fixed it.
From there, it was clear that I should make a local for Data as well - even though Data was not even a property (it was a field). And indeed that bought me some more performance.
Bafflingly, though, reordering the x and y loops (to improve cache usage) made zero difference, even though the array is huge (3000x3000).
Now, I want to learn why the stuff I did improved the performance. What book do you suggest I read?
CLR via C# by Jeffrey Richter.
It is such a great book that someone stolen it in my library together with C# in depth.
The CLR is not involved at all here, this should all be translated to straight machine code without calls into the CLR. The JIT compiler is responsible for generating that machine code, it has an optimizer that tries to come up with the most efficient code. It has limitations, it cannot spend a large amount of time on it.
One of the important things it does is figuring out what local variables should be stored in the CPU registers. That's something that changed when you put the Height property in a local variable. It possibly decided to store that variable in a register. But now there's one less available to store another variable. Like the x or y variable, one that's critical for speed. Yes, that will slow it down.
You got a bad diagnostic about the outer loop. That could possibly be caused by the JIT optimizer re-arranging the loop code, giving the profiler a harder time mapping the machine code back to the corresponding C# statement.
Similarly, the optimizer might have decided that you were using the array inefficiently and switched the indexing order back. Not so sure it actually does that, but not impossible.
Anyhoo, the only way you can get some insight here is by looking at the generated machine code. There are many decent books about x86 assembly code, although they might be a bit hard to find these days. Your starting point is Debug + Windows + Disassembly.
Keep in mind however that even the machine code is not a very good predictor of how efficient code is going to run. Modern CPU cores are enormously complicated and the machine code is no longer representative for what actually happens inside the core. The only tried and true way is what you've already been doing: trial and error.
Albin - no. Honestly I didn't think that running outside a profiler would change the performance difference, so I didn't bother. You think I should have? Has that been a problem for you before? (I am compiling with optimizations on though)
Running under a debugger changes the performance: when it's being run under a debugger, the just-in-time compiler automatically disables optimizations (to make it easier to debug)!
If you must, use the debugger to attach to an already-running already-JITted process.
One thing you should know about working with Arrays is that the CLR will always make sure that array-indices are not out-of-bounds. It has an optimization for 1-dimensional arrays but not for 2+ dimensions.
Knowing this, you may want to benchmark MyCell Data[][] instead of MyCell Data[,]
Hm, I don't think that the loop enrolling is the real problem.
1. I'd try to avoid accessing the array Data three times per inner loop.
2. I'd also recommend, to re-think the three Add statements: you are apparently accessing a collection three times to add trivial some data. Make it only one access per iteration and add a data type containing three entries:
for (int y = 0; ... {
tTemp = Data[x, y];
packedCells.Add(new {
tTemp.HasCar, tTemp.RoadState, tTemp.Population
});
}
Another look reveals, that you are basically vectorizing a matrix by copying it into an array (or some other sequential collection)... Is that necessary at all? Why don't you just define a specialized indexer which simulates that linear access? Even better, if you only need to enumerate the entries (in that example you do, no random access required), why don't you use an adequate LINQ expression?
Point 1) Educated guesses are not the way to do performance tuning. In this case I can guess about as well as most, but guessing is the wrong way to do it.
Point 2) Profilers need to be well understood before you know what they're actually telling you. Here's a discussion of the issues. For example, what many profilers do is tell you "where the program spends its time", i.e. where the program counter spends its time, so they are almost absolutely blind to time requested by function calls, which is what your inner loop seems to consist of.
I do a lot of performance tuning, and here is what I do. I cycle between two activities:
Overall time measurement. This doesn't require special tools. I'm not trying to measure individual routines.
"Bottleneck" location. This does not require running the code at any kind of speed, because I'm not measuring. What I'm doing is locating lines of code that are responsible for a significant percent of time. I know which lines they are because they are on the stack for that percent, and stack samples easily find them.
Once I find a "bottleneck" and fix it, I go back to the first step, measure what percent of time I saved, and do it all again on the next "bottleneck", typically from 2 to 6 times. I am helped by the "magnification effect", in which a fixed problem magnifies the percentage used by remaining problems. It works for both macro and micro optimization.
(Sorry if I can't write "bottleneck" without quotes, because I don't think I've ever found a performance problem that resembled the neck of a bottle. Rather they were all simply doing things that didn't really need to be done.)
Since the comment might be overseen, I repeat myself: it is quite cumbersome to optimize code which is per se overfluous. You do not really need to explicitely linearize your matrix at all, see the comment above: Define a linearizing adapter which implements IEnumerable<MyCell> and feed it into the formatter.
I am getting a warning when I try to add another answer, so I am going to recycle this one.. :) After reading Steve's comments and thinking about it for a while, I suggest the following:
If serializing a multi-dimensional array is too slow (haven't tryied, I just believe you...) don't use it at all! It appears, that your matrix is not sparse and has fixed dimensions. So define the structure holding your cells as simple linear array with indexer:
[Serializable()]
class CellMatrix {
Cell [] mCells;
public int Rows { get; }
public int Columns { get; }
public Cell this (int i, int j) {
get {
return mCells[i + Rows * j];
}
// setter...
}
// constructor taking rows/cols...
}
A thing like this should serialize as fast as native Array does... I don't recommend hard coding the layout of Cell in order to save few bytes there...
Cheers,
Paul
I am accustomed to C# not performing overflow checks, as the language spec states (ยง7.5.12):
For non-constant expressions (expressions that are evaluated at run-time) that are not enclosed by any checked or unchecked operators or statements, the default overflow checking context is unchecked unless external factors (such as compiler switches and execution environment configuration) call for checked evaluation.
I took advantage of this when doing an array bounds check in low-level code:
if ((uint)index >= (uint)TotalCount)
...
If index is negative, I expect it to become a large positive number so that it exceeds TotalCount. However, to my surprise, a negative number produces OverflowException, and I have to wrap the expression in unchecked(). I looked through the project options in Visual Studio and I do not see an option to enable or disable overflow checking. So why might it be on here?
It should be in the project.
Double click the Properties folder.
Build tab.
Click the Advanced... button.
Uncheck "Check for arithmetic overflow/underflow".