In what situations would I specify operation as unchecked? - c#

For example:
int value = Int32.MaxValue;
unchecked
{
value += 1;
}
In what ways would this be useful? can you think of any?

Use unchecked when:
You want to express a constant via overflow (this can be useful when specifying bit patterns)
You want arithmetic to overflow without causing an error
The latter is useful when computing a hash code - for example, in Noda Time the project is built with checked arithmetic for virtual everything apart from hash code generation. When computing a hash code, it's entirely normal for overflow to occur, and that's fine because we don't really care about the result as a number - we just want it as a bit pattern, really.
That's just a particularly common example, but there may well be other times where you're really happy for MaxValue + 1 to be MinValue.

Related

.NET vs Mono: different results for conversion from 2^32 as double to int

TL;DR Why does the (int)Math.Pow(2,32) return 0 on Mono and Int32.MinValue on .NET ?
While testing my .NET-written code on Mono I've stumbled upon the following line:
var i = X / ( (int)Math.Pow(2,32) );
well that line doesn't make much sense of course and I've already changed it to long.
I was however curious why didn't my code throw DivideByZeroException on .NET so I've checked the return value of that expression on both Mono and .NET
Can anyone please explain me the results?
IMHO the question is academic; the documentation promises only that "the result is an unspecified value of the destination type", so each platform is free to do whatever it wants.
One should be very careful when casting results that might overflow. If there is a possibility of that, and it's important to get a specific result instead of whatever arbitrary implementation a given platform has provided, one should use the checked keyword and catch any OverflowException that might occur, handling it with whatever explicit behavior is desired.
"the result is an unspecified value of the destination type". I thought it would be interesting to see what's actually happening in the .NET implementation.
It's to do with the OpCodes.Conv_I4 Field in IL: " Conversion from floating-point numbers to integer values truncates the number toward zero. When converting from a float64 to a float32, precision can be lost. If value is too large to fit in a float32 (F), positive infinity (if value is positive) or negative infinity (if value is negative) is returned. If overflow occurs converting one integer type to another, the high order bits are truncated" It does once again say overflow is unspecified however.

Why isn't an overflow exception raised?

The following runs fine without error and diff is 1:
int max = int.MaxValue;
int min = int.MinValue;
//Console.WriteLine("Min is {0} and max is {1}", min, max);
int diff = min - max;
Console.WriteLine(diff);
Wouldn't then all programs be suspect? a+b is no more the sum of a and b, where a and b are of type int. Sometimes it is, but sometimes it is the sum* of a, b and 2*int.MinValue.
* Sum as in the ordinary English meaning of addition, ignoring any computer knowledge or word size
In PowerShell, it looks better, but it still is not a hardware exception from the add operation. It appears to use a long before casting back to an int:
[int]$x = [int]::minvalue - [int]::maxvalue
Cannot convert value "-4294967295" to type "System.Int32". Error: "Value was either too large or too small for an Int32."
By default, overflow checking is turned off in C#. Values simply "wrap round" in the common way.
If you compiled the same code with /checked or used a checked { ... } block, it would throw an exception.
Depending on what you're doing, you may want checking or explicitly not want it. For example, in Noda Time we have overflow checking turned on by default, but explicitly turn it off for GetHashCode computations (where we expect overflow and have no problem with it) and computations which we know won't overflow, and where we don't want the (very slight) performance penalty of overflow checking.
See the checked and unchecked pages in the C# reference, or section 7.6.12 of the C# language specification, for more details.
When not specified, .NET will not check for numeric overflows when doing operations on numeric data types int, etc.
You can enable this by either building in checked mode (passing /checked into the compiler arguments), or use checked in a code segment:
checked
{
int i = int.MaxValue + int.MaxValue;
Console.WriteLine(i);
}

C#/XNA - Multiplication faster than Division?

I saw a tweet recently that confused me (this was posted by an XNA coder, in the context of writing an XNA game):
Microoptimization tip of the day: when possible, use multiplication instead of division in high frequency areas. It's a few cycles faster.
I was quite surprised, because I always thought compilers where pretty smart (for example, using bit-shifting), and recently read a post by Shawn Hargreaves saying much the same thing. I wondered how much truth there was in this, since there are lots of calculations in my game.
I inquired, hoping for a sample, however the original poster was unable to give one. He did, however, say this:
Not necessarily when it's something like "center = width / 2". And I've already determined "yes, it's worth it". :)
So, I'm curious...
Can anyone give an example of some code where you can change a division to a multiplication and get a performance gain, where the C# compiler wasn't able to do the same thing itself.
Most compilers can do a reasonable job of optimizing when you give them a chance. For example, if you're dividing by a constant, chances are pretty good that the compiler can/will optimize that so it's done about as quickly as anything you can reasonably substitute for it.
When, however, you have two values that aren't known ahead of time, and you need to divide one by the other to get the answer, if there was much way for the compiler to do much with it, it would -- and for that matter, if there was much room for the compiler to optimize it much, the CPU would do it so the compiler didn't have to.
Edit: Your best bet for something like that (that's reasonably realistic) would probably be something like:
double scale_factor = get_input();
for (i=0; i<values.size(); i++)
values[i] /= scale_factor;
This is relatively easy to convert to something like:
scale_factor = 1.0 / scale_factor;
for (i=0; i<values.size(); i++)
values[i] *= scale_factor;
I can't really guarantee much one way or the other about a particular compiler doing that. It's basically a combination of strength reduction and loop hoisting. There are certainly optimizers that know how to do both, but what I've seen of the C# compiler suggests that it may not (but I never tested anything exactly like this, and the testing I did was a few versions back...)
Although the compiler can optimize out divisions and multiplications by powers of 2, other numbers can be difficult or impossible to optimize. Try optimizing a division by 17 and you'll see why. This is of course assuming the compiler doesn't know that you are dividing by 17 ahead of time (it is a run-time variable, not a constant).
Bit late but never mind.
The answer to your question is yes.
Have a look at my article here, http://www.codeproject.com/KB/cs/UniqueStringList2.aspx, which uses information based on the article mentioned in the first comment to your question.
I have a QuickDivideInfo struct which stores the magic number and the shift for a given divisor thus allowing division and modulo to be calculated using faster multiplication. I pre-computed (and tested!) QuickDivideInfos for a list of Golden Prime Numbers. For x64 at least, the .Divide method on QuickDivideInfo is inlined and is 3x quicker than using the divide operator (on an i5); it works for all numerators except int.MinValue and cannot overflow since the multiplication is stored in 64 bits before shifting. (I've not tried on x86 but if it doesn't inline for some reasons then the neatness of the Divide method would be lost and you would have to manually inline it).
So the above will work in all scenarios (except int.MinValue) if you can precalculate. If you trust the code that generates the magic number/shift, then you can deal with any divisor at runtime.
Other well-known small divisors with a very limited range of numerators could be written inline and may well be faster if they don't need an intermediate long.
Division by multiple of two: I would expect the compiler to deal with this (as in your width / 2) example since it is constant. If it doesn't then changing it to width >> 1 should be fine
To give some numbers, on this pdf
http://cs.smith.edu/dftwiki/index.php/CSC231_Pentium_Instructions_and_Flags
of the Pentium we get some numbers, and they aren't good:
IMUL 10 or 11
FMUL 3+1
IDIV 46 (32 bits operand)
FDIV 39
We are speaking of BIG differences
while(start<=end)
{
int mid=(start+end)/2;
if(mid*mid==A)
return mid;
if(mid*mid<A)
{
start=mid+1;
ans=mid;
}
If i am doing this way the outcome is the TIME LIMIT EXCEEDED for square root of 2147483647
But if i am doing the following way then the thing is clear that for Division compiler responds faster than for multiplication.
while(start<=end)
{
int mid=(start+end)/2;
if(mid==A/mid)
return mid;
if(mid<A/mid)
{
start=mid+1;
ans=mid;
}
else
end=mid-1;
}

What would be a good hashCode for a DateRange class

I have the following class
public class DateRange
{
private DateTime startDate;
private DateTime endDate;
public override bool Equals(object obj)
{
DateRange other = (DateRange)obj;
if (startDate != other.startDate)
return false;
if (endDate != other.endDate)
return false;
return true;
}
...
}
I need to store some values in a dictionary keyed with a DateRange like:
Dictionary<DateRange, double> tddList;
How should I override the GetHashCode() method of DateRange class?
I use this approach from Effective Java for combining hashes:
unchecked
{
int hash = 17;
hash = hash * 31 + field1.GetHashCode();
hash = hash * 31 + field2.GetHashCode();
...
return hash;
}
There's no reason that shouldn't work fine in this situation.
It depends on the values I expect to see it used with.
If it was most often going to have different day values, rather than different times on the same day, and they were within a century of now, I would use:
unchecked
{
int hash = startDate.Year + endDate.Year - 4007;
hash *= 367 + startDate.DayOfYear;
return hash * 367 + endDate.DayOfYear;
}
This distributes the bits well with the expected values, while reducing the number of bits lost in the shifting. Note that while there cases where dependency on primes can be surprisingly bad at collisions (esp. when the hash is fed into something that uses a modulo of the same prime in trying to avoid collisions when producing a yet-smaller hash to distribute among its buckets) I've opted to go for primes above the more obvious choices, as they're only just above and so still pretty "tight" for bit-distribution. I don't worry much about using the same prime twice, as they're so "tight" in this way, but it does hurt if you've a hash-based collection with 367 buckets. This deals well (but not as well) with dates well into the past or future, but is dreadful if the assumption that there will be few or no ranges within the same day (differing in time) is wrong as that information is entirely lost.
If I was expecting (or writing for general use by other parties, and not able to assume otherwise) I'd go for:
int startHash = startDate.GetHashCode();
return (((startHash >> 24) & 0x000000FF) | ((startHash >> 8) & 0x0000FF00) | ((startHash << 8) & 0x00FF0000) | (unchecked((int)((startHash << 24) & 0xFF000000)))) ^ endDate.GetHashCode();
Where the first method works on the assumption that the general-purpose GetHashCode in DateTime isn't as good as we want, this one depends on it being good, but mixes around the bits of one value.
It's good in dealing with the more obvious tricky cases such as the two values being the same, or a common distance from each other (e.g. lots of 1day or 1hour ranges). It's not as good at the cases where the first example works best, but the first one totally sucks if there are lots of ranges using the same day, but different times.
Edit: To give a more detailed response to Dour's concern:
Dour points out, correctly, that some of the answers on this page lose data. The fact is, all of them lose data.
The class defined in the question has 8.96077483×1037 different valid states (or 9.95641648×1036 if we don't care about the DateTimeKind of each date), and the output of GetHashCode has 4294967296 possible states (one of which - zero - is also going to be used as the hashcode of a null value, which may be commonly compared with in real code). Whatever we do, we reduce information by a scale of 2.31815886 × 1027. That's a lot of information we lost!
It's likely true that we can lose more with some than in others. Certainly, it's easy to prove some solutions can lose more than others by writing a valid, but really poor, answer.
(The worse possible valid solution is return 0; which is valid as it never errors or mismatches on equal objects, but as poor as possible as it collides for all values. The performance of a hash-based collection becomes O(n), and slow as O(n) goes, as the constants involved are higher than such O(n) operations as searching an unordered list).
It's difficult to measure just how much is lost. How much more does shifting of some bits before XORing lose than swapping bits, considering that XOR halves the amount of information left. Even the naïve x ^ y doesn't lose more than a swap-and-xor, it just collides more on common values; swap-and-xor will collide on values where plain-xor does not.
Once we've got a choice between solutions that are not losing much more information than possible, but returning 4294967296 or close to 4294967296 possible values with a good distribution between those values, then the question is no longer how much information is lost (the answer that only 4.31376821×10-28 of the original information remains) but which information is lost.
This is why my first suggestion above ignores time components. There are 864000000000 "ticks" (the 100nanosecond units DateTime has a resolution of) in a day, and I throw away two chunks of those ticks (7.46496×1023 possible values between the two) on purpose because I'm thinking of a scenario where that information is not used anyway. In this case I've deliberately structured the mechanism in such a way as to pick which information gets lost, that improves the hash for a given situation, but makes it absolutely worthless if we had different values all with start and end dates happening no the same days but at different times.
Likewise x ^ y doesn't lose any more information than any of the others, but the information that it does lose is more likely to be significant than with other choices.
In the absence of any way to predict which information is likely to be of importance (esp. if your class will be public and its hash code used by external code), then we are more restricted in the assumptions we can safely make.
As a whole prime-mult or prime-mod methods are better in which information they lose than shift-based methods, except when the same prime is used in a further hashing that may take place inside a hash-based method, ironically with the same goal in mind (no number is relatively prime to itself! even primes) in which case they are much worse. On the other hand shift-based methods really fall down if fed into a shift-based further hash. There is no perfect hash for arbitrary data and arbitrary use (except when a class has few valid values and we match them all, in which case it's more strictly an encoding than a hash that we produce).
In short, you're going to lose information whatever you do, it's which you lose that's important.
Well, consider what characteristics a good hash function should have. It must:
be in agreement with Equals - that is, if Equals is true for two objects then the two hash codes have to also be the same.
never crash
And it should:
be very fast
give different results for similar inputs
What I would do is come up with a very simple algorithm; say, taking 16 bits from the hash code of the first and 16 bits from the hash code of the second, and combining them together. Make yourself a test case of representative samples; date ranges that are likely to be actually used, and see if this algorithm does give a good distribution.
A common choice is to xor the two hashes together. This is not necessarily a good idea for this type because it seems likely that someone will want to represent the zero-length range that goes from X to X. If you xor the hashes of two equal DateTimes you always get zero, which seems like a recipe for a lot of hash collisions.
You have to shift one end of the range, otherwise two equal dates will hash to zero, a pretty common scenario I imagine:
return startDate.GetHashCode() ^ (endDate.GetHashCode() << 4);
return startDate.GetHashCode() ^ endDate.GetHashCode();
might be a good start. You have to check that you get good distribution when there is equal distance between startDate and endDate, but different dates.

long/large numbers and modulus in .NET

I'm currently writing a quick custom encoding method where I take a stamp a key with a number to verify that it is a valid key.
Basically I was taking whatever number that comes out of the encoding and multiplying it by a key.
I would then multiply those numbers to the deploy to the user/customer who purchases the key. I wanted to simply use (Code % Key == 0) to verify that the key is valid, but for large values the mod function does not seem to function as expected.
Number = 468721387;
Key = 12345678;
Code = Number * Key;
Using the numbers above:
Code % Key == 11418772
And for smaller numbers it would correctly return 0. Is there a reliable way to check divisibility for a long in .NET?
Thanks!
EDIT:
Ok, tell me if I'm special and missing something...
long a = DateTime.Now.Ticks;
long b = 12345;
long c = a * b;
long d = c % b;
d == 10001 (Bad)
and
long a = DateTime.Now.Ticks;
long b = 12;
long c = a * b;
long d = c % b;
d == 0 (Good)
What am I doing wrong?
As others have said, your problem is integer overflow. You can make this more obvious by checking "Check for arithmetic overflow/underflow" in the "Advanced Build Settings" dialog. When you do so, you'll get an OverflowException when you perform *DateTime.Now.Ticks * 12345*.
One simple solution is just to change "long" to "decimal" (or "double") in your code.
In .NET 4.0, there is a new BigInteger class.
Finally, you say you're "... writing a quick custom encoding method ...", so a simple homebrew solution may be satisfactory for your needs. However, if this is production code, you might consider more robust solutions involving cryptography or something from a third-party who specializes in software licensing.
The answers that say that integer overflow is the likely culprit are almost certainly correct; you can verify that by putting a "checked" block around the multiplication and seeing if it throws an exception.
But there is a much larger problem here that everyone seems to be ignoring.
The best thing to do is to take a large step back and reconsider the wisdom of this entire scheme. It appears that you are attempting to design a crypto-based security system but you are clearly not an expert on cryptographic arithmetic. That is a huge red warning flag. If you need a crypto-based security system DO NOT ATTEMPT TO ROLL YOUR OWN. There are plenty of off-the-shelf crypto systems that are built by experts, heavily tested, and readily available. Use one of them.
If you are in fact hell-bent on rolling your own crypto, getting the math right in 64 bits is the least of your worries. 64 bit integers are way too small for this crypto application. You need to be using a much larger integer size; otherwise, finding a key that matches the code is trivial.
Again, I cannot emphasize strongly enough how difficult it is to construct correct crypto-based security code that actually protects real users from real threats.
Integer Overflow...see my comment.
The value of the multiplication you're doing overflows the int data type and causes it to wrap (int values fall between +/-2147483647).
Pick a more appropriate data type to hold a value as large as 5786683315615386 (the result of your multiplication).
UPDATE
Your new example changes things a little.
You're using long, but now you're using System.DateTime.Ticks which on Mono (not sure about the MS platform) is returning 633909674610619350.
When you multiply that by a large number, you are now overflowing a long just like you were overflowing an int previously. At that point, you'll probably need to use a double to work with the values you want (decimal may work as well, depending on how large your multiplier gets).
Apparently, your Code fails to fit in the int data type. Try using long instead:
long code = (long)number * key;
The (long) cast is necessary. Without the cast, the multiplication will be done in 32-bit integer form (assuming number and key variables are typed int) and the result will be casted to long which is not what you want. By casting one of the operands to long, you tell the compiler to perform the multiplication on two long numbers.

Categories

Resources