Looking at the C# numeric data types, I noticed that most of the types have a signed and unsigned version. I noticed that whereas the "default" integer, short and long are signed, and have their unsigned counterpart as uint, ushort and ulong; the "default" byte is instead unsigned - and have a signed counterpart in sbyte.
Just out of curiosity, why is byte so different from the rest? Was there a specific reason behind this or it is "just the way things are"?
Hope the question isn't too confusing due to my phrasing and excessive use of quotes. Heh..
I would say a byte is not considered a numeric type but defines a structure with 8 bits in size. Besides there is no signed byte notion, it is unsigned. Numbers on the otherhand are firstly considered to be signed, so stating they are unsigned which is less common warrants the prefix
[EDIT]
Forgot there is a signed byte (sbyte). I suppose it is rather historical and practical application. Ints are more common than UInts and byte is more common than sbyte.
Historically the terms byte, nibble and bit indicate a unit of storage, a mnemonic or code...not a numeric value. Having negative mega-bytes of memory or adding ASCII codes 1 and 2 expecting code 3 is kinda silly. In many ways there is no such thing as a signed "byte". Sometimes the line between "thing" and "value" is very blurry....as with most languages that treat byte as a thing and a value.
It's more so a degree of corruption of the terms. A byte is not inherently numeric in any form, it's simply a unit of storage.
However, bytes, characters, and 8-bit signed/unsigned integers have had their names used interchangeably where they probably should not have:
Byte denotes 8 bits of data, says
nothing about the format of the data.
Character denotes some data that
stores a representation of a single
text character.
"UINT8"/"INT8" denotes 8 bits of
data, in signed or unsigned format,
storing numeric integer values.
It really just comes down to being intuitive versus being consistent. It probably would have been cleaner if the .NET Framework used System.UInt8 and System.Int8 for consistency with the other integer types. But yeah it does seem a bit arbitrary.
For what it's worth MSIL (which all .NET languages compile to anyhow) is more consistent in that a sbyte is called an int8 and a byte is called an unsigned int8, short is called int16, etc.
But the term byte is typically not used to describe a numeric type but rather a set of 8 bits such as when dealing with files, serialization, sockets, etc. For example if Stream.Read worked with a System.Int8[] array, that would be a very unusual looking API.
Related
I seek to get a few thing confirmed regarding the way narrowing casts work with integral types in C# (5.0, .NET Framework 4.0/4.5). Bottom line: Can I be sure that the underlying bytes of integral types stay the same, both in order and value, when casting between signed and unsigned?
Let's say that I do the following:
short shortVal = -20000;
ushort ushortVal = (ushort)shortVal;
Now, the experiments I've done so far, shows me that the bytes in following two byte arrays:
byte[] shortBytes = BitConverter.GetBytes(shortVal);
byte[] ushortBytes = BitConverter.GetBytes(ushortVal);
do NOT differ. I have done this exact experiment with an explicit narrowing cast from short to ushort with the value of shortVal in the range from Int16.MinValue to Int16.MaxValue. All 2^16 cases checks out fine. The actual interpreted value though, is naturally re-interpreted since the bytes stay the same. I assume the signed integral types use two's complement to represent signed values (is this true?)
I need to know, if I can count on these conversion always being "byte-safe" - as in not changing the underlying bytes and their order. This also goes for conversion the other way, from unsigned to signed. Are these conversion exact reverses of each other? I am focused mostly on short/ushort and int/uint. But all integral types are of interest.
These detail are likely up to implementation of the technology behind C# and the CLR. I am here strictly focused on the CLR for Windows 32/64 bit.
This can be quite tricky.
CLR does use two's complement for signed integral number representation on x86/x64 architectures (as you observed in your test), and it's unlikely to change in the near future, because the architecture itself has good support for it. It's safe to assume it will stay like this for a while.
On the other hand, I haven't found any mention of this in either CLI or C# specifications, so you can't count on it in general, especially in face of other architectures and/or CLI implementations.
So it depends on what you want to use this on. I would stay away from depending on implementation details like this if possible, and use higher level serialization tools to convert to/from any binary representation.
Native method from dll works in java if the input parameter is array of bytes - byte[].
If we use the same method from c# it throws EntryPointNotFoundException.
Is that because of byte[] in java and c# are different things? and if it's so how should I use native function from c#?
Java lacks the unsigned types. In particular, Java lacks a primitive type for an unsigned byte. The Java byte type is signed, while the C# byte is unsigned and sbyte is signed.
Is that because of byte[] in java and c# are different things?
Yes.
Endianness: Java stores things internally as Big Endian, while .NET is Little Endian by default.
Signedness: C# bytes are unsigned. Java bytes are signed.
See different results when converting int to byte array - .NET vs Java.
What's the signature of the native function? How do you declare it in Java and in C#?
The most common reason for EntryPointNotFoundException is that function name is mangled (esp. true if function is written in C++) or misspelled.
Another source of problem is 'W' and 'A' suffixes for WinAPI function used to distinguish ANSI and Unicode versions of functions. .NET interop mechanism can try to guess the function suffix, so that may be the source of confusion,
Java Byte:
java byte: The byte data type is an 8-bit signed two's complement integer. It has a minimum value of -128 and a maximum value of 127 (inclusive). The byte data type can be useful for saving memory in large arrays, where the memory savings actually matters. They can also be used in place of int where their limits help to clarify your code; the fact that a variable's range is limited can serve as a form of documentation.
more for Java Byte
C# Byte
Byte Represents an 8-bit unsigned integer,Byte is an immutable value type that represents unsigned integers with values that range from 0 (which is represented by the Byte.MinValue constant) to 255 (which is represented by the Byte.MaxValue constant). The .NET Framework also includes a signed 8-bit integer value type, SByte, which represents values that range from -128 to 127.
more for c# Byte
I came from a mostly C/C++ background before I began using C#. One of the things I did with my first project in C# was make a class like this
class Element{
public uint Size;
public ulong BigThing;
}
I was then mortified by the fact that this requires casting:
int x=MyElement.Size;
as does
int x=5;
uint total=MyElement.Size+x;
Why did the language designers decide to make the signed and unsigned integer types not implicitly castable? And why are the unsigned types not used more throughout the .Net library? For instance String.Length can never be negative, yet it is a signed integer.
Why did the language designers decide to make the signed and unsigned integer types not
implicitly castable?
Because that could lose data or throw any exception, neither of which is generally a good thing to allow implicitly. (The implicit conversion from long to double can lose data too, admittedly, but in a different way.)
And why are the unsigned types not used more throughout the .Net library
Unsigned types aren't CLS-compliant - not all .NET languages have always supported them. For example, Visual Basic didn't have "native" support for unsigned data types in .NET 1.0 and 1.1; it was added to the language for 2.0. (You could still use them, but they weren't part of the language itself - you couldn't use the normal arithmetic operators, for example.)
Along with Jon's answer, just because an unsigned number can't be negative doesn't mean it isn't bigger than a signed one. uint is 0 to 4,294,967,295 but int is -2,147,483,648 to 2,147,483,647. Plenty of room above int's max for loss.
Because implicitly converting an unsigned integer of 3B into an signed integer is going to blow up.
Unsigned has twice the maximum value of signed. It's the same reason you can't cast a long to an int.
I was then mortified by the fact that this requires casting:
int x=MyElement.Size;
But you are contradicting yourself here. If you really (really) need Size to be unsigned than assigning it to (signed) x is an error. A deep flaw in your code.
For instance String.Length can never be negative, yet it is a signed integer
But String.IndexOf can return a negative number, and it would be awkward if String.Length and Index values where of different types.
And while in theory there would be merit in an unsigned String.Length (4 GB cap), in practice even the current 2GB is large enough (because strings of that length are rare and unworkable anyway).
So the real answer is: Why use unsigned in the first place?
On the second count: because they wanted the CLR to be compatible with languages that don't have unsigned datatypes (read: VB.NET).
In my C# source code I may have declared integers as:
int i = 5;
or
Int32 i = 5;
In the currently prevalent 32-bit world they are equivalent. However, as we move into a 64-bit world, am I correct in saying that the following will become the same?
int i = 5;
Int64 i = 5;
No. The C# specification rigidly defines that int is an alias for System.Int32 with exactly 32 bits. Changing this would be a major breaking change.
The int keyword in C# is defined as an alias for the System.Int32 type and this is (judging by the name) meant to be a 32-bit integer. To the specification:
CLI specification section 8.2.2 (Built-in value and reference types) has a table with the following:
System.Int32 - Signed 32-bit integer
C# specification section 8.2.1 (Predefined types) has a similar table:
int - 32-bit signed integral type
This guarantees that both System.Int32 in CLR and int in C# will always be 32-bit.
Will sizeof(testInt) ever be 8?
No, sizeof(testInt) is an error. testInt is a local variable. The sizeof operator requires a type as its argument. This will never be 8 because it will always be an error.
VS2010 compiles a c# managed integer as 4 bytes, even on a 64 bit machine.
Correct. I note that section 18.5.8 of the C# specification defines sizeof(int) as being the compile-time constant 4. That is, when you say sizeof(int) the compiler simply replaces that with 4; it is just as if you'd said "4" in the source code.
Does anyone know if/when the time will come that a standard "int" in C# will be 64 bits?
Never. Section 4.1.4 of the C# specification states that "int" is a synonym for "System.Int32".
If what you want is a "pointer-sized integer" then use IntPtr. An IntPtr changes its size on different architectures.
int is always synonymous with Int32 on all platforms.
It's very unlikely that Microsoft will change that in the future, as it would break lots of existing code that assumes int is 32-bits.
I think what you may be confused by is that int is an alias for Int32 so it will always be 4 bytes, but IntPtr is suppose to match the word size of the CPU architecture so it will be 4 bytes on a 32-bit system and 8 bytes on a 64-bit system.
According to the C# specification ECMA-334, section "11.1.4 Simple Types", the reserved word int will be aliased to System.Int32. Since this is in the specification it is very unlikely to change.
No matter whether you're using the 32-bit version or 64-bit version of the CLR, in C# an int will always mean System.Int32 and long will always mean System.Int64.
The following will always be true in C#:
sbyte signed 8 bits, 1 byte
byte unsigned 8 bits, 1 byte
short signed 16 bits, 2 bytes
ushort unsigned 16 bits, 2 bytes
int signed 32 bits, 4 bytes
uint unsigned 32 bits, 4 bytes
long signed 64 bits, 8 bytes
ulong unsigned 64 bits, 8 bytes
An integer literal is just a sequence of digits (eg 314159) without any of these explicit types. C# assigns it the first type in the sequence (int, uint, long, ulong) in which it fits. This seems to have been slightly muddled in at least one of the responses above.
Weirdly the unary minus operator (minus sign) showing up before a string of digits does not reduce the choice to (int, long). The literal is always positive; the minus sign really is an operator. So presumably -314159 is exactly the same thing as -((int)314159). Except apparently there's a special case to get -2147483648 straight into an int; otherwise it'd be -((uint)2147483648). Which I presume does something unpleasant.
Somehow it seems safe to predict that C# (and friends) will never bother with "squishy name" types for >=128 bit integers. We'll get nice support for arbitrarily large integers and super-precise support for UInt128, UInt256, etc. as soon as processors support doing math that wide, and hardly ever use any of it. 64-bit address spaces are really big. If they're ever too small it'll be for some esoteric reason like ASLR or a more efficient MapReduce or something.
Yes, as Jon said, and unlike the 'C/C++ world', Java and C# aren't dependent on the system they're running on. They have strictly defined lengths for byte/short/int/long and single/double precision floats, equal on every system.
int without suffix can be either 32bit or 64bit, it depends on the value it represents.
as defined in MSDN:
When an integer literal has no suffix, its type is the first of these types in which its value can be represented: int, uint, long, ulong.
Here is the address:
https://msdn.microsoft.com/en-us/library/5kzh1b5w.aspx
I would like to understand why on .NET there are nine integer types: Char, Byte, SByte, Int16, UInt16, Int32, UInt32, Int64, and UInt64; plus other numeric types: Single, Double, Decimal; and all these types have no relation at all.
When I first started coding in C# I thought "cool, there's a uint type, I'm going to use that when negative values are not allowed". Then I realized no API used uint but int instead, and that uint is not derived from int, so a conversion was needed.
What are the real world application of these types? Why not have, instead, integer and positiveInteger ? These are types I can understand. A person's age in years is a positiveInteger, and since positiveInteger is a subset of integer there's so need for conversion whenever integer is expected.
The following is a diagram of the type hierarchy in XPath 2.0 and XQuery 1.0. If you look under xs:anyAtomicType you can see the numeric hierarchy decimal > integer > long > int > short > byte. Why wasn't .NET designed like this? Will the new framework "Oslo" be any different?
My guess would be because the underlying hardware breaks that class hierarchy. There are (perhaps surprisingly) many times when you care that a UInt32 is a 4 bytes big and unsigned, so a UInt32 is not a kind of Int32, nor is an Int32 a type of Int64.
And you almost always care about the difference between an int and a float.
Fundamentally, inheritance & the class hierarchy are not the same as mathematical set inclusion. The fact that the values a UInt32 can hold are a strict subset of the values an Int64 can hold does not mean that a UInt32 is a type of Int64. Less obviously, an Int32 is not a type of Int64 - even though there's no conceptual difference between them, their underlying representations are different (4 bytes versus 8 bytes). Decimals are even more different.
XPath is different: the representations for all the numeric types are fundamentally the same - a string of ASCII digits. There, the difference between a short and a long is one of possible range rather than representation - "123" is both a valid representation of a short and a valid representation of a long with the same value.
Decimal is intended for calculations that need precision (basically, money).
See here: http://msdn.microsoft.com/en-us/library/364x0z75(VS.80).aspx
Singles/Doubles are different to decimals, because they're intended to be an approximation (basically, for scientific calculations).
That's why they're not related.
As for bytes and chars, they're totally different: a byte is 0-255, whereas a char is a character, and can hence store unicode characters (there are a lot more than 255 of them!)
Uints and ints don't convert automatically, because they can each store values that are impossible for the other (uints have twice the positive range of ints).
Once you get the hang of it all, it actually does make a lot of sense.
As for your ages thing, i'd simply use an int ;)