What actually happens when a Byte overflows?
Say we have
byte byte1 = 150; // 10010110
byte byte2 = 199; // 11000111
If we now do this addition
byte byte3 = byte1 + byte2;
I think we'll end up with byte3 = 94 but what actually happens? Did I overwrite some other memory somehow or is this totally harmless?
It's quite simple. It just does the addition and comes off at a number with more than 8 bits. The ninth bit (being one) just 'falls off' and you are left with the remaining 8 bits which form the number 94.
(yes it's harmless)
The top bits will be truncated. It is not harmful to any other memory, it is only harmful in terms of unintended results.
In C# if you have
checked { byte byte3 = byte1 + byte2; }
It will throw an overflow exception. Code is compiled unchecked by default. As the other answers are saying, the value will 'wrap around'. ie, byte3 = (byte1 + byte2) & 0xFF;
The carry flag gets set... but besides the result not being what you expect, there should be no ill effects.
Typically (and the exact behaviour will depend on the language and platform), the result will be taken modulo-256. i.e. 150+199 = 349. 349 mod 256 = 93.
This shouldn't affect any other storage.
Since you have tagged your question C#, C++ and C, I'll answer about C and C++. In c++ overflow on signed types, including sbyte (which, I believe, is signed char in C/C++) results in undefined behavior. However for unsigned types, such as byte (which is unsigned char in C++) the result is takes modulo 2n where n is the number of bits in the unsigned type. In C# the second rule holds, and the signed types generate an exception if they are in checked block. I may be wrong in the C# part.
Overflow is harmless in c# - you won't overflow memory - you simply obly get the last 8 bits of the result. If you want this to this an exception, use the 'checked' keyword. Note also that you may find byte+byte gives int, so you may need to cast back to byte.
The behavior depends on the language.
In C and C++, signed overflow is undefined and unsigned overflow has the behavior you mentioned (although there is no byte type).
In C#, you can use the checked keyword to explicitly say you want to receive an exception if there is overflow and the unchecked keyword to explicitly say you want to ignore it.
Leading bit just dropped off.
And arithmetic overflow occurs. Since 150+199=349, binary 1 0101 1101, the upper 1 bit is dropped and the byte becomes 0101 1101; i.e. the number of bits a byte can hold overflowed.
No damage was done - e.g. the memory didn't overflow to another location.
Let's look at what actually happens (in C (assuming you've got the appropriate datatype, as some have pointed out that C doesn't have a "byte" datatype; nonetheless, there are 8-bit datatypes which can be added)). If these bytes are declared on the stack, they exist in main memory; at some point, the bytes will get copied to the processor for operation (I'm skipping over several important steps, such as processsor cacheing...). Once in the processor, they will be stored in registers; the processor will execute an add operation upon those two registers to add the data together. Here's where the cause of confusion occurs. The CPU will perform the add operation in the native (or sometimes, specified) datatype. Let's say the native type of the CPU is a 32-bit word (and that that datatype is what is used for the add operation); that means these bytes will be stored in 32-bit words with the upper 24 bits unset; the add operation will indeed do the overflow in the target 32-bit word. But (and here's the important bit) when the data is copied back from the register to the stack, only the lowest 8 bits (the byte) will be copied back to the target variable's location on the stack. (Note that there's some complexity involved with byte packing and the stack here as well.)
So, here's the upshot; the add causes an overflow (depending on the specific processor instruction chosen); the data, however, is copied out of the processor into a datatype of the appropriate size, so the overflow is unseen (and harmless, assuming a properly written compiler).
As far as C# goes, adding two values of type byte together results in a value of type int which must then be cast back to byte.
Therefore your code sample will result in a compiler error without a cast back to byte as in the following.
byte byte1 = 150; // 10010110
byte byte2 = 199; // 11000111
byte byte3 = (byte)(byte1 + byte2);
See MSDN for more details on this. Also, see the C# language specification, section 7.3.6 Numeric promotions.
Related
I have this expression
long balance = (long)answer.Find(DppGlobals.TAG_TI_BALANCE).Get_QWORD();
which raises exception that there was overflow. The value on the right hand side is of unsigned type and has value: 18446744073708240732.
How to avoid this exception, use unchecked?
PS equivalent C++ implementation returned balance = -1310984, I need same value here too.
Why is there such an exception?
By using unchecked indeed now also on C# side I get -1310984. Can someone advice, am I losing data somehow?
Using uncheck will also crop your value, but you will not have an exception,you will get wrong number.I think you will do something changing your code, not mask your exception.
long balance = (unchecked))(long)answer.Find(DppGlobals.TAG_TI_BALANCE).Get_QWORD());
Why is there such an exception?
It is a very good that you get an error.Because if you are working in the sphere where everything is numbers, you can have a mistake and don't understand this.So the RunTime will show you your eror
Given the comments, it sounds like you do just need to use unchecked arithmetic - but you should be concerned about the use of ulong in your API. If your aim is to just propagate 8 bytes of data, and interpret it as either an unsigned integer or a signed integer depending on context, then you're fine - so long as nothing performs arithmetic on it in the "wrong" form.
It's important to understand why this happens though, and it's much easier to explain that with small numbers. I'll use byte and sbyte as an example. The range of byte is 0 to 255 inclusive. The range of sbyte is -128 to 127 inclusive. Both can store 256 different values. No problem.
Now, using unchecked conversions, it's fine to say:
byte a = GetByteFromSomewhere();
sbyte b = (sbyte) a;
StoreSByteSomewhere(b);
...
sbyte b = GetSByteFromStorage();
byte a = (byte) b;
If that's all you're doing, that's fine. A byte value of -1 will become an sbyte value of 255 and vice versa - basically it's just interpreting the same bit pattern in different ways.
But if you're performing other operations on the value when it's being handled as the "wrong" type, then you could get unexpected answers. For example, consider "dividing by 3". If you divide -9 by 3, you get -3, right? Whereas if you have:
byte original = -9;
sbyte converted = (sbyte) original;
sbyte divided = converted / 3;
byte result = (byte) divided;
... then you end up with a result of -2 instead of -3, due to the way the arithmetic worked.
Now that's all for unchecked conversions. When you have a checked conversion, it doesn't just interpret the bits - it treats the value as a number instead. So for example:
// In a checked context...
byte a = 128;
sbyte b = (sbyte) a; // Bang! Exception
That's because 128 (as a number) is outside the range of sbyte. This is what's happening in your case - the number 18446744073708240732 is outside the range of long, so you're getting an exception. The checked conversion is treating it as a number which can be range-checked, rather than an unchecked conversion just reinterpreting the bits as a long (which leads to the negative number you want).
i have this code in c:
long data1 = 1091230456;
*(double*)&((data1)) = 219999.02343875566
when i use the same code in C# the result is:
*(double*)&((data1)) = 5.39139480005278E-315
but if i define another varibale in C# :
unsafe
{long *data2 = &(data1);}
now :
*(double)&((data2)) = 219999.02343875566
Why the difference ?
Casting pointers is always tricky, especially when you don't have guarantees about the layout and size of the underlying types.
In C#, long is always a 64-bit integer and double is always 64-bit floating point number.
In C, long can easily end up being smaller than the 64-bits needed. If you're using a compiler that translates long as a 32-bit number, the rest of the value will be junk read from the next piece of memory - basically a "buffer" overflow.
On Windows, you usually want to use long long for 64-bit integers. Or better, use something like int64_t, where you're guaranteed to have exactly 64-bits of data. Or the best, don't cast pointers.
C integer types can be confusing if you have a Java / C# background. They give you guarantees about the minimal range they must allow, but that's it. For example, int must be able to hold values in the [−32767,+32767] range (note that it's not -32768 - C had to support one's complement machines, which had two zeroes), close to C#'s short. long must be able to hold values in the [−2147483647,+2147483647] range, close to C#'s int. Finally, long long is close to C#'s long, having at least the [-2^63,+2^63] range. float and double are specified even more loosely.
Whenever you cast pointers, you throw away even the tiny bits of abstraction C provides you with - you work with the underlying hardware layouts, whatever those are. This is one road to hell and something to avoid.
Sure, these days you probably will not find one's complement numbers, or other floating points than IEEE 754, but it's still inherently unsafe and unpredictable.
EDIT:
Okay, reproducing your example fully in a way that actually compiles:
unsafe
{
long data1 = 1091230456;
long *data2 = &data1;
var result = *(double*)&((data2));
}
result ends up being 219999.002675845 for me, close enough to make it obvious. Let's see what you're actually doing here, in more detail:
Store 1091230456 in a local data1
Take the address of data1, and store it in data2
Take the address of data2, cast it to a double pointer
Take the double value of the resulting pointer
It should be obvious that whatever value ends up in result has little relation to the value you stored in data1 in the first place!
Printing out the various parts of what you're doing will make this clearer:
unsafe
{
long data1 = 1091230456;
long *pData1 = &data1;
var pData2 = &pData1;
var pData2Double = (double*)pData2;
var result = *pData2Double;
new
{
data1 = data1,
pData1 = (long)pData1,
pData2 = (long)pData2,
pData2Double = (long)pData2Double,
result = result
}.Dump();
}
This prints out:
data1: 1091230456
pData1: 91941328
pData2: 91941324
pData2Double: 91941324
result: 219999.002675845
This will vary according to many environmental settings, but the critical part is that pData2 is pointing to memory four bytes in front of the actual data! This is because of the way the locals are allocated on stack - pData2 is pointing to pData1, not to data1. Since we're using 32-bit pointers here, we're reading the last four bytes of the original long, combined with the stack pointer to data1. You're reading at the wrong address, skipping over one indirection. To get back to the correct result, you can do something like this:
var pData2Double = (double**)pData2;
var result = *(*pData2Double);
This results in 5.39139480005278E-315 - the original value produced by your C# code. This is the more "correct" value, as far as there can even be a correct value.
The obvious answer here is that your C code is wrong as well - either due to different operand semantics, or due to some bug in the code you're not showing (or again, using a 32-bit integer instead of 64-bit), you end up with a pointer to a pointer to the value you want, and you mistakenly build the resulting double on a scrambled value that includes part of the original long, as well as the stack pointer - in other words, exactly one of the reasons you should be extra cautious whenever using unsafe code. Interestingly, this also implies that when compiled as a 64-bit executable, the result will be entirely decoupled from the value of data1 - you'll have a double built on the stack pointer exclusively.
Don't mess with pointers until you understand indirection very, very well. They have a tendency to "mostly work" when used entirely wrong. Then you change a tiny part of the code (for example, in this code you could add a third local, which could change where pData1 is allocated) or move to a different architecture (32-bit vs. 64-bit is quite enough in this example), or a different compiler, or a different OS... and it breaks completely. You don't guess around your way with pointers. Either you know exactly what every single expression in the code means, or you shouldn't deal with pointers at all.
Looking at the C# numeric data types, I noticed that most of the types have a signed and unsigned version. I noticed that whereas the "default" integer, short and long are signed, and have their unsigned counterpart as uint, ushort and ulong; the "default" byte is instead unsigned - and have a signed counterpart in sbyte.
Just out of curiosity, why is byte so different from the rest? Was there a specific reason behind this or it is "just the way things are"?
Hope the question isn't too confusing due to my phrasing and excessive use of quotes. Heh..
I would say a byte is not considered a numeric type but defines a structure with 8 bits in size. Besides there is no signed byte notion, it is unsigned. Numbers on the otherhand are firstly considered to be signed, so stating they are unsigned which is less common warrants the prefix
[EDIT]
Forgot there is a signed byte (sbyte). I suppose it is rather historical and practical application. Ints are more common than UInts and byte is more common than sbyte.
Historically the terms byte, nibble and bit indicate a unit of storage, a mnemonic or code...not a numeric value. Having negative mega-bytes of memory or adding ASCII codes 1 and 2 expecting code 3 is kinda silly. In many ways there is no such thing as a signed "byte". Sometimes the line between "thing" and "value" is very blurry....as with most languages that treat byte as a thing and a value.
It's more so a degree of corruption of the terms. A byte is not inherently numeric in any form, it's simply a unit of storage.
However, bytes, characters, and 8-bit signed/unsigned integers have had their names used interchangeably where they probably should not have:
Byte denotes 8 bits of data, says
nothing about the format of the data.
Character denotes some data that
stores a representation of a single
text character.
"UINT8"/"INT8" denotes 8 bits of
data, in signed or unsigned format,
storing numeric integer values.
It really just comes down to being intuitive versus being consistent. It probably would have been cleaner if the .NET Framework used System.UInt8 and System.Int8 for consistency with the other integer types. But yeah it does seem a bit arbitrary.
For what it's worth MSIL (which all .NET languages compile to anyhow) is more consistent in that a sbyte is called an int8 and a byte is called an unsigned int8, short is called int16, etc.
But the term byte is typically not used to describe a numeric type but rather a set of 8 bits such as when dealing with files, serialization, sockets, etc. For example if Stream.Read worked with a System.Int8[] array, that would be a very unusual looking API.
In my C# source code I may have declared integers as:
int i = 5;
or
Int32 i = 5;
In the currently prevalent 32-bit world they are equivalent. However, as we move into a 64-bit world, am I correct in saying that the following will become the same?
int i = 5;
Int64 i = 5;
No. The C# specification rigidly defines that int is an alias for System.Int32 with exactly 32 bits. Changing this would be a major breaking change.
The int keyword in C# is defined as an alias for the System.Int32 type and this is (judging by the name) meant to be a 32-bit integer. To the specification:
CLI specification section 8.2.2 (Built-in value and reference types) has a table with the following:
System.Int32 - Signed 32-bit integer
C# specification section 8.2.1 (Predefined types) has a similar table:
int - 32-bit signed integral type
This guarantees that both System.Int32 in CLR and int in C# will always be 32-bit.
Will sizeof(testInt) ever be 8?
No, sizeof(testInt) is an error. testInt is a local variable. The sizeof operator requires a type as its argument. This will never be 8 because it will always be an error.
VS2010 compiles a c# managed integer as 4 bytes, even on a 64 bit machine.
Correct. I note that section 18.5.8 of the C# specification defines sizeof(int) as being the compile-time constant 4. That is, when you say sizeof(int) the compiler simply replaces that with 4; it is just as if you'd said "4" in the source code.
Does anyone know if/when the time will come that a standard "int" in C# will be 64 bits?
Never. Section 4.1.4 of the C# specification states that "int" is a synonym for "System.Int32".
If what you want is a "pointer-sized integer" then use IntPtr. An IntPtr changes its size on different architectures.
int is always synonymous with Int32 on all platforms.
It's very unlikely that Microsoft will change that in the future, as it would break lots of existing code that assumes int is 32-bits.
I think what you may be confused by is that int is an alias for Int32 so it will always be 4 bytes, but IntPtr is suppose to match the word size of the CPU architecture so it will be 4 bytes on a 32-bit system and 8 bytes on a 64-bit system.
According to the C# specification ECMA-334, section "11.1.4 Simple Types", the reserved word int will be aliased to System.Int32. Since this is in the specification it is very unlikely to change.
No matter whether you're using the 32-bit version or 64-bit version of the CLR, in C# an int will always mean System.Int32 and long will always mean System.Int64.
The following will always be true in C#:
sbyte signed 8 bits, 1 byte
byte unsigned 8 bits, 1 byte
short signed 16 bits, 2 bytes
ushort unsigned 16 bits, 2 bytes
int signed 32 bits, 4 bytes
uint unsigned 32 bits, 4 bytes
long signed 64 bits, 8 bytes
ulong unsigned 64 bits, 8 bytes
An integer literal is just a sequence of digits (eg 314159) without any of these explicit types. C# assigns it the first type in the sequence (int, uint, long, ulong) in which it fits. This seems to have been slightly muddled in at least one of the responses above.
Weirdly the unary minus operator (minus sign) showing up before a string of digits does not reduce the choice to (int, long). The literal is always positive; the minus sign really is an operator. So presumably -314159 is exactly the same thing as -((int)314159). Except apparently there's a special case to get -2147483648 straight into an int; otherwise it'd be -((uint)2147483648). Which I presume does something unpleasant.
Somehow it seems safe to predict that C# (and friends) will never bother with "squishy name" types for >=128 bit integers. We'll get nice support for arbitrarily large integers and super-precise support for UInt128, UInt256, etc. as soon as processors support doing math that wide, and hardly ever use any of it. 64-bit address spaces are really big. If they're ever too small it'll be for some esoteric reason like ASLR or a more efficient MapReduce or something.
Yes, as Jon said, and unlike the 'C/C++ world', Java and C# aren't dependent on the system they're running on. They have strictly defined lengths for byte/short/int/long and single/double precision floats, equal on every system.
int without suffix can be either 32bit or 64bit, it depends on the value it represents.
as defined in MSDN:
When an integer literal has no suffix, its type is the first of these types in which its value can be represented: int, uint, long, ulong.
Here is the address:
https://msdn.microsoft.com/en-us/library/5kzh1b5w.aspx
Given this field:
char lookup_ext[8192] = {0}; // Gets filled later
And this statement:
unsigned short *slt = (unsigned short*) lookup_ext;
What happens behind the scenes?
lookup_ext[1669] returns 67 = 0100 0011 (C), lookup_ext[1670] returns 78 = 0100 1110 (N) and lookup_ext[1671] returns 68 = 0100 0100 (D); yet slt[1670] returns 18273 = 0100 0111 0110 0001.
I'm trying to port this to C#, so besides an easy way out of this, I'm also wondering what really happens here. Been a while since I used C++ regularly.
Thanks!
The statement that you show doesn't cast a char to an unsigned short, it casts a pointer to a char to a pointer to an unsigned short. This means that the usual arithmetic conversions of the pointed-to-data are not going to happen and that the underlying char data will just be interpreted as unsigned shorts when accessed through the slt variable.
Note that sizeof(unsigned short) is unlikely to be one, so that slt[1670] won't necessarily correspond to lookup_ext[1670]. It is more likely - if, say, sizeof(unsigned short) is two - to correspond to lookup_ext[3340] and lookup_ext[3341].
Do you know why the original code is using this aliasing? If it's not necessary, it might be worth trying to make the C++ code cleaner and verifying that the behaviour is unchanged before porting it.
If I understand correctly, the type conversion will be converting a char array of size 8192 to a short int array of size half of that, which is 4096.
So I don't understand what you are comparing in your question. slt[1670] should correspond to lookup_ext[1670*2] and lookup_ext[1670*2+1].
Well, this statement
char lookup_ext[8192] = {0}; // Gets filled later
Creates an array either locally or non-locally, depending on where the definition occurs. Initializing it like that, with an aggregate initializer will initialize all its elements to zero (the first explicitly, the remaining ones implicitly). Therefore i wonder why your program outputs non-zero values. Unless the fill happens before the read, then that makes sense.
unsigned short *slt = (unsigned short*) lookup_ext;
That will interpret the bytes making up the array as unsigned short objects when you read from that pointer's target. Strictly speaking, the above is undefined behavior, because you can't be sure the array is suitable aligned, and you would read from a pointer that's not pointing at the type of the original pointed type (unsigned char <-> unsigned short). In C++, the only portable way to read the value out of some other pod (plain old data. that's all the structs and simple types that are possible in C too (such as short), broadly speaking) is by using such library functions as memcpy or memmove.
So if you read *slt above, you would interpret the first sizeof(*slt) bytes of the array, and try to read it as unsigned short (that's called type pun).
When you do "unsigned short slt = (unsigned short) lookup_ext;", the no. of bytes equivalent to the size of (unsigned short) are picked up from the location given by lookup_ext, and stored at the location pointed to by slt. Since unsigned short would be 2 bytes, first two bytes from lookup_ext would be stored in the location given by slt.