Why is Array.Length an int, and not an uint [duplicate] - c#

This question already has answers here:
Why does .NET use int instead of uint in certain classes?
(7 answers)
Closed 9 years ago.
Why is Array.Length an int, and not a uint. This bothers me (just a bit) because a length value can never be negative.
This also forced me to use an int for a length-property on my own class, because when you
specify an int-value, this needs to be cast explicitly...
So the ultimate question is: is there any use for an unsigned int (uint)? Even Microsoft seems not to use them.

Unsigned int isn't CLS compliant and would therefore restrict usage of the property to those languages that do implement a UInt.
See here:
Framework 1.1
Introduction to the .NET Framework Class Library
Framework 2.0
.NET Framework Class Library Overview

Many reasons:
uint is not CLS compliant, thus making a built in type (array) dependent on it would have been problematic
The runtime as originally designed prohibits any object on the heap occupying more than 2GB of memory. Since the maximum sized array that would less than or equal to this limit would be new byte[int.MaxValue] it would be puzzling to people to be able to generate positive but illegal array lengths.
Note that this limitation has been somewhat removed in the 4.5 release, though the standard Length as int remains.
Historically C# inherits much of its syntax and convention from C and C++. In those arrays are simply pointer arithmetic so negative array indexing was possible (though normally illegal and dangerous). Since much existing code assumes that the array index is signed this would have been a factor
On a related note the use of signed integers for array indexes in C/C++ means that interop with these languages and unmanaged functions would require the use of ints in those circumstances anyway, which may confuse due to the inconsistency.
The BinarySearch implementation (a very useful component of many algorithms) relies on being able to use the negative range of the int to indicate that the value was not found and the location at which such a value should be inserted to maintain sorting.
When operating on an array it is likely that you would want to take a negative offset of an existing index. If you used an offset which would take you past the start of the array using unit then the wrap around behaviour would make your index possibly legal (in that it is positive). With an int the result would be illegal (but safe since the runtime would guard against reading invalid memory)

I think it also might have to do with simplifying things on a lower level, since Array.Length will of course be added to a negative number at some point, if Array.Length were unsigned, and added to a negative int (two's complement), there could be messy results.

Looks like nobody provided answer to "the ultimate question".
I believe primary use of unsigned ints is to provide easier interfacing with external systems (P/Invoke and the like) and to cover needs of various languages being ported to .NET.

Typically, integer values are signed, unless you explicitly need an unsigned value. It's just the way they are used. I may not agree with that choice, but that's just the way it is.
For the time being, with todays typical memory constraints, if your array or similar data structure needs an UInt32 length, you should consider other data structures.
With an array of bytes, Int32 will give you 2GB of values

Related

Opposite behavior of Marshal.SizeOf and sizeof operator for boolean and char data types in C#

I was comparing Marshal.SizeOf API with sizeof operator in C#. Their outputs for char and bool data types are little surprising. Here are the results:
For Boolean:
Marshal.SizeOf = 4
sizeof = 1
For char:
Marshal.SizeOf = 1
sizeof = 2
On this link from MSDN I got following text:
For all other types, including structs, the sizeof operator can be
used only in unsafe code blocks. Although you can use the
Marshal.SizeOf method, the value returned by this method is not always
the same as the value returned by sizeof. Marshal.SizeOf returns the
size after the type has been marshaled, whereas sizeof returns the
size as it has been allocated by the common language runtime,
including any padding.
I do not know a lot about technicalities of Marshaling but it has something to do with Run-time heuristics when things change. Going by that logic for bool the size changes from 1 to 4. But for char (from 2 to 1) it is just the reverse which is a boomerang for me. I thought for char also it should also increase the way it happened for bool. Can some one help me understand these conflicting behaviors?
Sorry, you really do have to consider the technicalities to make sense of these choices. The target language for pinvoke is the C language, a very old language by modern standards with a lot of history and used in a lot of different machine architectures. It makes very few assumptions about the size of a type, the notion of a byte does not exist. Which made the language very easy to port to the kind of machines that were common back when C was invented and the unusual architectures used in super-computers and digital signal processors.
C did not originally have a bool type. Logical expressions instead use int where a value of 0 represents false and any other value represents true. Also carried forward into the winapi, it does use a BOOL type which is an alias for int. So 4 was the logical choice. But not a universal choice and you have to watch out, many C++ implementations use a single byte, COM Automation chose two bytes.
C does have a char type, the only guarantee is that it has at least 8 bits. Whether it is signed or unsigned is unspecified, most implementations today use signed. Support for an 8-bit byte is universal today on the kind of architectures that can execute managed code so char is always 8 bits in practice. So 1 was the logical choice.
That doesn't make you happy, nobody is happy about it, you can't support text written in an arbitrary language with an 8-bit character type. Unicode came about to solve the disaster with the many possible 8-bit encodings that were in use but it did not have much of an affect on the C and C++ languages. Their committees did add wchar_t (wide character) to the standard but in keeping with old practices they did not nail down its size. Which made it useless, forcing C++ to later add char16_t and char32_t. It is however always 16 bits in compilers that target Windows since that is the operating system's choice for characters (aka WCHAR). It is not in the various Unix flavors, they favor utf8.
That works well in C# too, you are not stuck with 1 byte characters. Every single type in the .NET framework has an implicit [StructLayout] attribute with a CharSet property. The default is CharSet.Ansi, matching the C language default. You can however easily apply your own and pick CharSet.Unicode. You now get two bytes per character, using the utf16 encoding, the string is copied as-is since .NET also uses utf16. Making sure that the native code expects strings in that encoding is however up to you.

what is the exact maximum limit of elements in an array

This is a purelly theoretical question, so please do not warn me of that in your answers.
If I am not mistaken, and since every array in .NET is indexed by an Int32, meaning the index ranges from 0 to Int32.MaxValue.
Supposing no memory/GC constraints are involved an array in .NET can have up to 2147483648 (and not 2147483647) elements. Right?
Well, in theory that's true. In fact, in theory there could be support for larger arrays - see this Array.CreateInstance signature which takes long values for the lengths. You wouldn't be able to index such an array using the C# indexers, but you could use GetValue(long).
However, in practical terms, I don't believe any implementation supports such huge arrays. The CLR has a per-object limit a bit short of 2GB, so even a byte array can't actually have 2147483648 elements. A bit of experimentation shows that on my box, the largest array you can create is new byte[2147483591]. (That's on the 64 bit .NET CLR; the version of Mono I've got installed chokes on that.)
EDIT: Just looking at the CLI spec, it specifies that arrays have a lower bound and upper bound of an Int32. That would mean upper bounds over Int32.MaxValue are prohibited even though they can be expressed with the Array.CreateInstance calls. However, it also means it's permissable to have an array with bounds Int32.MinValue...Int.MaxValue, i.e. 4294967296 elements in total.
EDIT: Looking again, ECMA 335 partition III section 4.20 (newarr) specifies that a initializing a vector type with newarr has to take either a native int or int32 value. So it looks like while the normally-more-lenient "array" type in CLI terminology has to have int32 bounds, a "vector" type doesn't.

Why do C# containers and GUI classes use int and not uint for size related members?

I usually program in C++, but for school i have to do a project in C#.
So i went ahead and coded like i was used to in C++, but was surprised when the compiler complained about code like the following:
const uint size = 10;
ArrayList myarray = new ArrayList(size); //Arg 1: cannot convert from 'uint' to 'int
Ok they expect int as argument type, but why ? I would feel much more comfortable with uint as argument type, because uint fits much better in this case.
Why do they use int as argument type pretty much everywhere in the .NET library even if though for many cases negative numbers dont make any sense (since no container nor gui element can have a negative size).
If the reason that they used int is, that they didnt expect that the average user cares about signedness, why didnt they add overloads for uint additonally ?
Is this just MS not caring about sign correctness or are there cases where negative values make some sense/ carry some information (error code ????) for container/gui widget/... sizes ?
I would imagine that Microsoft chose Int32 because UInt32 is not CLS-compliant (in other words not all languages that use the .NET framework support unsigned integers).
Because unsigned integers are not CLS compliant. There are languages that are missing support for them, Java would be an example.
In addition to the answers talking about CLS compliance, consider that integer math (e.g. 10 + 2) results in integer (as in signed) data, which only makes sense; now consider the bother of having to cast every mathematical expression to uint to pass it to one of the methods you refer to.
As for overloads that take uint -- in many cases method arguments are stored as (or used to calculate) property values, which are usually of type int (again for CLS compliance, and possibly for integer math convenience); the discrepancy in sign would be confusing, not to mention vulnerable to overflow.
Stroustrup prefers int over "uint" in The C++ Programming Language, and I think his reasons apply to C# too:
It's about underflow with no warning:
// Unintended very large uint
uint oops = (uint)5 - (uint)10;
or:
// Unintended infinite loop
for( uint counter = 10; counter >= 0; counter-- )
; // Do something
The extra bit of info is rarely worth having to watch for these kinds of bugs.
This possibly come a "little" late, but just found the question and want to add a missing bit.
A prime example where negative values make perfect sense is in graphical frameworks. For sizes, as stated in the question, negatives are out of question, but for position values it's perfectly acceptable to have negative values. Such values make objects to appear off-screen or at least partially cropped:
It follows the very same principle as in mathematics, negative coordinates just make points to go to the opposing from where the axis grows values. Assuming that (0,0) is at the upper-left corner of the screen, negative values displace things to the left and top of that point, making them half-visible.
This is useful for example if you want to implement a scrolling region, where the contents are larger than the available space. Simply all objects positions become negative to begin disappear from the top or larger that the height for disappear from bottom.
Such things aren't limited to C#. Winforms and WPF use that, as per in the question, but most other graphical environments have the same behavior. HTML+CSS can place elements in the same way, or the C/C++ library SDL also can make use of this effect.

Why does .NET use int instead of uint in certain classes?

I always come across code that uses int for things like .Count, etc, even in the framework classes, instead of uint.
What's the reason for this?
UInt32 is not CLS compliant so it might not be available in all languages that target the Common Language Specification. Int32 is CLS compliant and therefore is guaranteed to exist in all languages.
int, in c, is specifically defined to be the default integer type of the processor, and is therefore held to be the fastest for general numeric operations.
Unsigned types only behave like whole numbers if the sum or product of a signed and unsigned value will be a signed type large enough to hold either operand, and if the difference between two unsigned values is a signed value large enough to hold any result. Thus, code which makes significant use of UInt32 will frequently need to compute values as Int64. Operations on signed integer types may fail to operate like whole numbers when the operands are overly large, but they'll behave sensibly when operands are small. Operations on unpromoted arguments of unsigned types pose problems even when operands are small. Given UInt32 x; for example, the inequality x-1 < x will fail for x==0 if the result type is UInt32, and the inequality x<=0 || x-1>=0 will fail for large x values if the result type is Int32. Only if the operation is performed on type Int64 can both inequalities be upheld.
While it is sometimes useful to define unsigned-type behavior in ways that differ from whole-number arithmetic, values which represent things like counts should generally use types that will behave like whole numbers--something unsigned types generally don't do unless they're smaller than the basic integer type.
UInt32 isn't CLS-Compliant. http://msdn.microsoft.com/en-us/library/system.uint32.aspx
I think that over the years people have come to the conclusions that using unsigned types doesn't really offer that much benefit. The better question is what would you gain by making Count a UInt32?
Some things use int so that they can return -1 as if it were "null" or something like that. Like a ComboBox will return -1 for it's SelectedIndex if it doesn't have any item selected.
If the number is truly unsigned by its intrinsic nature then I would declare it an unsigned int. However, if I just happen to be using a number (for the time being) in the positive range then I would call it an int.
The main reasons being that:
It avoids having to do a lot of type-casting as most methods/functions are written to take an int and not an unsigned int.
It eliminates possible truncation warnings.
You invariably end up wishing you could assign a negative value to the number that you had originally thought would always be positive.
Are just a few quick thoughts that came to mind.
I used to try and be very careful and choose the proper unsigned/signed and I finally realized that it doesn't really result in a positive benefit. It just creates extra work. So why make things hard by mixing and matching.
Some old libraries and even InStr use negative numbers to mean special cases. I believe either its laziness or there's negative special values.

Converting to int16, int32, int64 - how do you know which one to choose?

I often have to convert a retreived value (usually as a string) - and then convert it to an int. But in C# (.Net) you have to choose either int16, int32 or int64 - how do you know which one to choose when you don't know how big your retrieved number will be?
Everyone here who has mentioned that declaring an Int16 saves ram should get a downvote.
The answer to your question is to use the keyword "int" (or if you feel like it, use "Int32").
That gives you a range of up to 2.4 billion numbers... Also, 32bit processors will handle those ints better... also (and THE MOST IMPORTANT REASON) is that if you plan on using that int for almost any reason... it will likely need to be an "int" (Int32).
In the .Net framework, 99.999% of numeric fields (that are whole numbers) are "ints" (Int32).
Example: Array.Length, Process.ID, Windows.Width, Button.Height, etc, etc, etc 1 million times.
EDIT: I realize that my grumpiness is going to get me down-voted... but this is the right answer.
Just wanted to add that... I remembered that in the days of .NET 1.1 the compiler was optimized so that 'int' operations are actually faster than byte or short operations.
I believe it still holds today, but I'm running some tests now.
EDIT: I have got a surprise discovery: the add, subtract and multiply operations for short(s) actually return int!
Repeatedly trying TryParse() doesn't make sense, you have a field already declared. You can't change your mind unless you make that field of type Object. Not a good idea.
Whatever data the field represents has a physical meaning. It's an age, a size, a count, etc. Physical quantities have realistic restraints on their range. Pick the int type that can store that range. Don't try to fix an overflow, it would be a bug.
Contrary to the current most popular answer, shorter integers (like Int16 and SByte) do often times take up less space in memory than larger integers (like Int32 and Int64). You can easily verify this by instantiating large arrays of sbyte/short/int/long and using perfmon to measure managed heap sizes. It is true that many CLR flavors will widen these integers for CPU-specific optimizations when doing arithmetic on them and such, but when stored as part of an object, they take up only as much memory as is necessary.
So, you definitely should take size into consideration especially if you'll be working with large list of integers (or with large list of objects containing integer fields). You should also consider things like CLS-compliance (which disallows any unsigned integers in public members).
For simple cases like converting a string to an integer, I agree an Int32 (C# int) usually makes the most sense and is likely what other programmers will expect.
If we're just talking about a couple numbers, choosing the largest won't make a noticeable difference in your overall ram usage and will just work. If you are talking about lots of numbers, you'll need to use TryParse() on them and figure out the smallest int type, to save ram.
All computers are finite. You need to define an upper limit based on what you think your users requirements will be.
If you really have no upper limit and want to allow 'unlimited' values, try adding the .Net Java runtime libraries to your project, which will allow you to use the java.math.BigInteger class - which does math on nearly-unlimited size integer.
Note: The .Net Java libraries come with full DevStudio, but I don't think they come with Express.

Categories

Resources