Why don't Java, C# and C++ have ranges? - c#

Ada, Pascal and many other languages support ranges, a way to subtype integers.
A range is a signed integer value which ranges from a value (first) to another (last).
It's easy to implement a class that does the same in OOP but I think that supporting the feature natively could let the compiler to do additional static checks.
I know that it's impossible to verify statically that a variabile defined in a range is not going to "overflow" runtime, i.e. due to bad input, but I think that something could be done.
I think about the Design by Contract approach (Eiffel) and the Spec# ( C# Contracts ), that give a more general solution.
Is there a simpler solution that checks, at least, static out-of-bound assignment at compile time in C++, C# and Java? Some kind of static-assert?
edit: I understand that "ranges" can be used for different purpose:
iterators
enumerators
integer subtype
I would focus on the latter, because the formers are easily mappable on C* language .
I think about a closed set of values, something like the music volume, i.e. a range that goes from 1 up to 100. I would like to increment or decrement it by a value. I would like to have a compile error in case of static overflow, something like:
volume=rangeInt(0,100);
volume=101; // compile error!
volume=getIntFromInput(); // possible runtime exception
Thanks.

Subrange types are not actually very useful in practice. We do not often allocate fixed length arrays, and there is also no reason for fixed sized integers. Usually where we do see fixed sized arrays they are acting as an enumeration, and we have a better (although "heavier") solution to that.
Subrange types also complicate the type system. It would be much more useful to bring in constraints between variables than to fixed constants.
(Obligatory mention that integers should be arbitrary size in any sensible language.)

Ranges are most useful when you can do something over that range, concisely. That means closures. For Java and C++ at least, a range type would be annoying compared to an iterator because you'd need to define an inner class to define what you're going to do over that range.

Java has had an assert keyword since version 1.4. If you're doing programming by contract, you're free to use those to check proper assignment. And any mutable attribute inside an object that should fall within a certain range should be checked prior to being set. You can also throw an IllegalArgumentException.
Why no range type? My guess is that the original designers didn't see one in C++ and didn't consider it as important as the other features they were trying to get right.

For C++, a lib for constrained values variables is currently being implemented and will be proposed in the boost libraries : http://student.agh.edu.pl/~kawulak/constrained_value/index.html

Pascal (and also Delphi) uses a subrange type but it is limited to ordinal types (integer, char and even boolean).
It is primarilly an integer with extra type checking. You can fake that in an other language using a class. This gives the advantage that you can apply more complex ranges.

I would add to Tom Hawtin response (to which I agree) that, for C++, the existence of ranges would not imply they would be checked - if you want to be consistent to the general language behavior - as array accesses, for instance, are also not range-checked anyway.
For C# and Java, I believe the decision was based on performance - to check ranges would impose a burden and complicate the compiler.
Notice that ranges are mainly useful during the debugging phase - a range violation should never occur in production code (theoretically). So range checks are better to be implemented not inside the language itself, but in pre- and post- conditions, which can (should) be stripped out when producing the release build.

This is an old question, but just wanted to update it. Java doesn't have ranges per-se, but if you really want the function you can use Commons Lang which has a number of range classes including IntRange:
IntRange ir = new IntRange(1, 10);
Bizarrely, this doesn't exist in Commons Math. I kind of agree with the accepted answer in part, but I don't believe ranges are useless, particularly in test cases.

C++ allows you to implement such types through templates, and I think there are a few libraries available doing this already. However, I think in most cases, the benefit is too small to justify the added complexity and compilation speed penalty.
As for static assert, it already exists.
Boost has a BOOST_STATIC_ASSERT, and on Windows, I think Microsoft's ATL library defines a similar one.
boost::type_traits and boost::mpl are probably your best friends in implementing something like this.

The flexibility to roll your own is better than having it built into the language. What if you want saturating arithmetic for example, instead of throwing an exception for out of range values? I.e.
MyRange<0,100> volume = 99;
volume += 10; // results in volume==100

In C# you can do this:
foreach(int i in System.Linq.Enumerable.Range(0, 10))
{
// Do something
}

JSR-305 provides some support for ranges but I don't know when if ever this will be part of Java.

Related

Why can I not encode const structs?

I wish to encode hard coded value of a const Point struct.
Why does the compiler not allow neither internal, nor arbitrary structs to be replaced during compilation? Since the internal bitwise representation can be established at compile time (in both cases), there is no apparent reason for the restriction.
My question is: Is there a way to hard-code a predefined set of bytes in c# that can be interpreted at compile time as the appropriate type, since all structs have a predetermined memory outline.
EDIT:
To clarify: Compile time means C# -> IL byte-code as stored in the output assembly.
The use case example:
public void Draw(Bitmap bmp, Point Location = new Point(0,0)) // invalid
This is an error because the new Point(0,0) cannot be evaluated at compile time. I can pass in int X = 0, int Y = 0 or the nullable Point? Location = null and generate the struct inside of the method, or Overload the method without the optional parameters and call the main method passing in the default values, but that technique incurs a performance penalty in terms of the extra method calls required.
This may not be appropriate for all structs, since the constructor could rely on, or change, external state or randomness.
FINAL EDIT:
This is now possible. Making the question moot. Yay.
The issue was the incorrect belief that the new keyword always implied heap allocation or dynamic stack allocation, with constant arguments neither case was true.
Why does the compiler not allow neither internal, nor arbitrary structs to be replaced during compilation? Since the internal bitwise representation can be established at compile time (in both cases), there is no apparent reason for the restriction.
All not-implemented features are not implemented for the same reason. To be implemented, a feature must be thought of, judged to be appropriate, designed, specified, implemented, tested, documented and shipped. All those things must happen. For your proposed feature, none of them happened. Therefore, no feature.
Programming language designers are not required to provide a justification for why a feature was not implemented. Rather, the people who want the feature are required to provide a reason why programming language designers should spend their valuable time implementing a feature that you want.
The C# design process is open, and the compiler source code is available. Why have you not designed and implemented the feature? If it is fair for you to ask the designers that question, it's fair for them to ask it of you! You're a computer programmer; get busy programming computers and build the feature if you think it is worthwhile, and then convince the language team to accept your pull request. If you don't think it is worth your time to do that, well, probably the language designers feel the same way.
My question is: Is there a way to hard-code a predefined set of bytes in c# that can be interpreted at compile time as the appropriate type, since all structs have a predetermined memory outline.
I'm not sure what you mean by "at compile time"; can you clarify?
There are ways to store byte arrays in an assembly, sure. Make a C# program with a byte array initialized to all constant values and ildasm the assembly; you'll see the code that the C# compiler generates to get the byte array image out of the metadata and into the array.
You could implement similar shenanigans to get a byte array, fix the array in place, and then use unsafe pointer magic to reinterpret the array bytes as struct bytes. That sounds extraordinarily dangerous, and might mess up the performance of the garbage collector. I would not wish to do so myself, but you seem pretty keen on this feature, so go for it and report back what you find out!
Alternatively, C++/CLI probably implements the feature you want; I've never used it but that seems like the sort of thing it would do. You could write a little program in C++/CLI that does what you want, and then either (1) use that program's assembly as a dependency of your assembly, (2) compile it as a netmodule link it in to your assembly via the usual netmodule linking gear (yuck) or (3) deduce how they implemented the feature and then do the same.
You can convert a struct into byte array and encode a byte array. It may work like this: (please compile and fix any errors (typing through mobile))
Suppose your struct is:
public struct TestStruct
{
public int x;
public string y;
}
public byte[] GetTestByte(TestStruct c)
{
var intGuy = BitConverter.GetBytes(c.x);
var stringGuy = Encoding.UTF8.GetBytes(c.y);
var both = stringGuy.Concat(intGuy).Concat(new byte[1]).ToArray();
return both;
}
Now you can encode a byte array like:
Convert.ToBase64String(byteArray);
There is no direct way to encode a struct. Its a value type at best and probably there as legacy for C and C++

Range[] instead of get_Range()

http://msdn.microsoft.com/en-us/library/microsoft.office.tools.excel.worksheet.get_range.aspx it says to use the Range property instead of get_Range(Object Cell1, Object Cell2).
They are both doing the same thing, Gets a Microsoft.Office.Interop.Excel.Range object that represents a cell or a range of cells. So, what's the difference except that this is a method and another is a property? Why are they pointing on use of Range[], what's the reason for it?
Range() is faster than Range[]
By practice we have noticed it the case. But here should define a reason to say so.
This shortcut is convenient when you want to refer to an absolute range. However, it is not as flexible as the Rangeproperty as it cannot handle variable input as strings or object references. So at the end of the day you will still end up referring the long way. Although the shorty provides readability. Hence might as well get it right the first round without more resources spending.
Now why is it slow? In the compiling.
"During run-time Excel always uses conventional notation (or so I've been told), so when the code is being compiled all references in shortcut notation must be converted to conventional range form (or so I've been told). {ie [A150] must be converted to Range("A150") form}. Whatever the truth of what I've been told, Visual Basic has to memorize both its compiled version of the code and whatever notation you used to write your code (i.e. whatever's in the code module), the workbook properties for the file size (the memory used) thus goes up slightly. "
As you see my answer was more in line with VBA. However after some research it is sort of proved that VBA side doesn't do much slowing down. So you only need to take care of the C# side. #Hans gives you a better answer in C# perspective. Hope combining both that you will get a great performing code :)
Here is some finding on the performance of Range[] vs Range() in Excel
If you use C# version 4 and up then you can use the Range indexer. But you have to use get_Range() on earlier versions.
Do note that there's something special about it, the default property of a COM interface maps to the indexer. But the Range property is not the default property of a Worksheet, it is just a regular property. Trouble is, C# does not permit declaring indexed properties other than the indexer. Works in VB.NET, not in C#, you had to call the property getter method directly. By popular demand, the C# team dropped this restriction in version 4 (VS2010). But only on COM interfaces, you still cannot declare indexed properties in your own code.
I have used both and both returned the same results. I think Range[] actually uses get_Range() internally.
For a question of naming convention I only use Range[] now.

Why aren't C#'s Math.Min/Max variadic?

I need to find the minimum between 3 values, and I ended up doing something like this:
Math.Min(Math.Min(val1, val2), val3)
It just seems a little silly to me, because other languages use variadic functions for this. I highly doubt this was an oversight though.
Is there any reason why a simple Min/Max function shoundn't be variadic? Are there performance implications? Is there a variadic version that I didn't notice?
If it is a collection (A subclass of IEnumerable<T>) one could easily use the functions in the System.Linq library
int min = new int[] {2,3,4,8}.Min();
Furthermore, it's easy to implement these methods on your own:
public static class Maths {
public static T Min<T> (params T[] vals) {
return vals.Min();
}
public static T Max<T> (params T[] vals) {
return vals.Max();
}
}
You can call these methods with just simple scalars so Maths.Min(14,25,13,2) would give 2.
These are the generic methods, so there is no need to implement this method for each type of numbers (int,float,...)
I think the basic reason why these methods are not implemented in general is that, every time you call this method, an array (or at least an IList object) should be created. By keeping low-level methods one can avoid the overhead. However, I agree one could add these methods to the Math class to make the life of some programmers easier.
CommuSoft has addressed how to accomplish the equivalent in C#, so I won't retread that part.
To specifically address your question "Why aren't C#'s Math.Min/Max variadic?", two thoughts come to mind.
First, Math.Min (and Math.Max) is not, in fact, a C# language feature, it is a .NET framework library feature. That may seem pedantic, but it is an important distinction. C# does not, in fact, provide any special purpose language feature for determining the minimum or maximum value between two (or more) potential values.
Secondly, as Eric Lippert has pointed out a number of times, language features (and presumably framework features) are not "removed" or actively excluded - all features are unimplemented until someone designs, implements, tests, documents and ships the feature. See here for an example.
Not being a .NET framework developer, I cannot speak to the actual decision process that occurred, but it seems like this is a classic case of a feature that simply never rose to the level of inclusion, similar to the sequence foreach "feature" Eric discusses in the provided link.
I think CommuSoft is providing a robust answer that is at least suited for people searching for something along these lines, and that should be accepted.
With that said, the reason is definitely to avoid the overhead necessary for the less likely use case that people want to compare a group rather than two values.
As pointed about by #arx, using a parametric would be unnecessary overhead for the most used case, but it would also be a lot of unnecessary overhead with regards to the loop that would have to be used internally to go through the array n - 1 times.
I can easily see an argument for having created the method in addition to the basic form, but with LINQ that's just no longer necessary.

How to alias a built-in type in C#?

So in C++, I'm used to being able to do:
typedef int PeerId;
This allows me to make a type more self-documenting, but additionally also allows me to make PeerId represent a different type at any time without changing all of the code. I could even turn PeerId into a class if I wanted. This kind of extensibility is what I want to have in C#, however I am having trouble figuring out how to create an alias for 'int' in C#.
I think I can use the using statement, but it only has scope in the current file I believe, so that won't work (The alias needs to be accessible between multiple files without being redefined). I also can't derive a class from built-in types (but normally this is what I would do to alias ref-types, such as List or Dictionary). I'm not sure what I can do. Any ideas?
You need to use the full type name like this:
using DWORD = System.Int32;
You could (ab)use implicit conversions:
struct PeerId
{
private int peer;
public static implicit operator PeerId(int i)
{
return new PeerId {peer=i};
}
public static implicit operator int(PeerId p)
{
return p.peer;
}
}
This takes the same space as an int, and you can do:
PeerId p = 3;
int i = p;
But I agree you probably don't need this.
Summary
Here's the short answer:
Typedefs are actually a variable used by compile-time code generators.
C# is being designed to avoid adding code generation language constructs.
Therefore, the concept of typedefs doesn't fit in well with the C# language.
Long Answer
In C++, it makes more sense: C++ started off as a precompiler that spit out C code, which was then compiled. This "code generator" beginning still has effects in modern C++ features (i.e., templates are essentially a Turing-complete language for generating classes and functions at compile time). In this context, a typedef makes sense because it's a way to get the "result" of a compile-time type factory or "algorithm" that "returns" a type.
In this strange meta-language (which few outside of Boost have mastered), a typedef is actually a variable.
What you're describing is less complex, but you're still trying to use the typedef as a variable. In this case, it's used as an input variable. So when other code uses the typedef, it's really not using that type directly. Rather, it's acting as a compile-time code generator, building classes and methods based on typedef'ed input variables. Even if you ignore C++ templates and just look at C typedefs, the effect is the same.
C++ and Generative Programming
C++ was designed to be a multi-paradign language (OO and procedural, but not functional until Boost came out). Interestingly enough, templates have evolved an unexpected paradign: generative programming. (Generative programming was around before C++, but C++ made it popular). Generative programs are actually meta-programs that - when compiled - generate the needed classes and methods, which are in turn compiled into executables.
C# and Generative Programming
Our tools are slowly evolving in the same direction. Of course, reflection emit can be used for "manual" generative programming, but it is quite painful. The way LINQ providers use expression trees is very generative in nature. T4 templates get really close but still fall short. The "compiler as a service" which will hopefully be part of C# vNext appears most promising of all, if it could be combined with some kind of type variable (such as a typedef).
This one piece of the puzzle is still missing: generative programs need some sort of automatic trigger mechanism (in C++, this is handled by implicit template instantiation).
However, it is explicitly not a goal of C# to have any kind of "code generator" in the C# language like C++ templates (probably for the sake of understandability; very few C++ programmers understand C++ templates). This will probably be a niche satisfied by T4 rather than C#.
Conclusion (repeating the Summary)
All of the above is to say this:
Typedefs are a variable used by code generators.
C# is being designed to avoid adding code generation language constructs.
Therefore, the concept of typedefs doesn't fit in well with the C# language.
I also sometimes feel I need (integer) typedefs for similar purposes to the OP.
If you do not mind the casts being explicit (I actually want them to be) you can do this:
enum PeerId : int {};
Will also work for byte, sbyte, short, ushort, uint, long, or ulong (obviously).
Not exactly the intended usage of enum, but it does work.
Since C# 10 you can use global using:
global using PeerId = System.Int32;
It works for all files.
It should appear before all using directives without the global modifier.
See using directive.
Redefining fundamental types just for the sake of changing the name is C++ think and does not sit well with the more pure Object Orientated C#. Whenever you get the urge to shoehorn a concept from one language into another, you must stop and think whether or not it makes sense and try to stay native to the platform.
The requirement of being able to change the underlying type easily can be satisfied by defining your own value type. Coupled with implicit conversion operators and arithmetic operators, you have the power to define very powerful types. If you are worried about performance for adding layers on top of simple types, don't. 99% chance that it won't, and the 1% chance is that in case it does, it will not the be "low hanging fruit" of performance optimization.

Using Unsigned Primitive Types

Most of time we represent concepts which can never be less than 0. For example to declare length, we write:
int length;
The name expresses its purpose well but you can assign negative values to it. It seems that for some situations, you can represent your intent more clearly by writing it this way instead:
uint length;
Some disadvantages that I can think of:
unsigned types (uint, ulong, ushort) are not CLS compliant so you can't use it with other languages that don't support this
.Net classes use signed types most of the time so you have to cast
Thoughts?
“When in Rome, do as the Romans do.”
While there is theoretically an advantage in using unsigned values where applicable because it makes the code more expressive, this is simply not done in C#. I'm not sure why the developers initially didn't design the interfaces to handle uints and make the type CLS compliant but now the train has left the station.
Since consistency is generally important I'd advise taking the C# road and using ints.
If you decrement a signed number with a value of 0, it becomes negative and you can easily test for this. If you decrement an unsigned number with a value of 0, it underflows and becomes the maximum value for the type - somewhat more difficult to check for.
Your second point is the most important. Generally you should just use int since that's a pretty good "catch-all" for integer values. I would only use uint if you absolutely need the ability to count higher than int, but without using the extra memory long requires (it's not much more memory, so don't be cheap :-p).
I think the subtle use of uint vs. int will cause confusing with developers unless it was written into developer guidelines for the company.
If the length, for example, can't be less than zero then it should be expressed clearly in the business logic so future developers can read the code and know the true intent.
Just my 2 cents.
I will point out that in C# you can turn on /checked to check for arithmetic overflow / underflow, which isn't a bad idea anyways. If performance matters in a critical section, you can still use unchecked to avoid this.
For internal code (ie code that won't be referenced in any interop manor with other languages) I vote for using unsigned when the situation warrants it, such as length variables as mentioned earlier. This - along with checked arithmetic - provides one more net for developers, catching subtle bugs earlier.
Another point in the signed vs unsigned debate is that some programmers use values such as -1 to indicate errors, when they wouldn't otherwise have meaning. I subscribe to the view that each variable should have only one purpose, but if you - or colleagues you code with - like to indicate errors in this way, leaving variables signed gives you the flexibility to add error states later.
Your two points are good. The primary reason to avoid it is casting, though. Casting makes them incredibly annoying to use. I tried using unisigned variables once but I had to sprinkle casts absolutely everywhere because the framework methods all use signed integers. Therefore, whenever you call a framework method, you have to cast.

Categories

Resources