What are some advantages of making enum in Java similar to a class, rather than just a collection of constants as in C/C++?
You get free compile time checking of valid values. Using
public static int OPTION_ONE = 0;
public static int OPTION_TWO = 1;
does not ensure
void selectOption(int option) {
...
}
will only accept 0 or 1 as a parameter value. Using an enum, that is guaranteed. Moreover, this leads to more self documenting code, because you can use code completion to see all enum values.
Type safety is one reason.
Another, that I find more important, is that you can attach metadata to enum values in Java. For example, you could use an enum to define the set of legal operations for a webservice, and then attach metadata for the type of request and data class:
AddItem(HttpMethod.POST, ProductEntry.class),
Java 5 enums originated from a typesafe enum pattern from Joshua Bloch's Effective Java (the first edition) to avoid the pitfalls of enums in C/C++/C# (which are simply thinly-veiled int constants) and the use in Java of final static int constants.
Primarily int constants and int enums aren't typesafe. You can pass in any int value. In C/C++ you can do this:
enum A { one, two, three };
enum B { beef, chicken, pork } b = beef;
void func(A a) { ... }
func((A)b);
Unfortunately the typesafe enum pattern from Effective Java had a lot of boilerplate, not all of it obvious. The most notable is you had to override the private method readResolve to stop Java creating new instances on deserialization, which would break simple reference checking (ie using the == operator instead of equals()).
So Java 5 enums offer these advantages over ints:
Type safety;
Java 5 enums can have behaviour and implement interfaces;
Java 5 enums have some extremely lightweight data structures like EnumSet and EnumMap.
Java 5 enums over these advantages over just using classes:
Less error-prone boilerplate (private constructor, readResolve() etc);
Semantic correctness. You see something is an enum and you know it's just representing a value. You see a class and you're not sure. Maybe there's a static factory method somewhere, etc. Java 5 enums much more clearly indicate intent.
Enums are already a class in Java.
If you're asking why this is better, I'd say that better type safety and the ability to add other attributes besides a mere ordinal value would come to mind.
In addition to better type safety, you can also define custom behavior in your enums (refer to Effective Java for some good examples).
You can use enums to effectively implement Singletons ^^:
public enum Elvis {
INSTANCE
}
Making enum a reference type that can contain fixed set of constants has led to efficient Map implementation like EnumMap and Set implementation like EnumSet (JDK classes).
From javadoc of EnumMap :
A specialized Map implementation for use with enum type keys. All of the keys in an enum map must come from a single enum type that is specified, explicitly or implicitly, when the map is created. Enum maps are represented internally as arrays. This representation is extremely compact and efficient.
EnumMap combines richness and type safety of Map with the speed of an array (Effective Java).
Enums are a type in itself - you cannot use an enum that does not exist, or put in some other similar looking constant. and also, you can enumerate them, so that code can be more concise.
using static constants could potentially cause maintenence nightmares - especially if they area spread out.
The only real advantage is that it can be used in a switch statement. All the other stuff an enum is capable of can just be done with plain vanilla class with a private constructor whose instances in turn are declared as public static final fields of the class in question (the typesafe pattern). The other advantage of enum is obviously that it makes the code less verbose than you would do with a plain vanilla class.
But if I'm not mistaken, in C++ (or was it C#?) you can use a String in a switch statement. So that advantage of enums in Java is negligible as opposed to C++. However, same thing was proposed for Java 7, not sure if it will make it.
Benefits of Using Enumerations:
An object can be created to work in the same manner as an enumeration. In fact,
enumerations were not even included in the Java language until version 5.0. However,
enumerations make code more readable and provide less room for programmer error.
OCA Java SE 7 Programmer I Study Guide
Related
I have read a lot about the danger of using structures as a data type, I wonder if there is any problem with this one,
List<Summarized> SummarizedList = new List<Summarized>();
Summarized SumInfo;
struct Summarized
{
public string sBrand;
public string sModel;
public string sCustomer;
public int sline;
public string sLeader;
public int sDesire;
public int sReal;
}
As you can see I use a generic list of items type SumInfo which is the struct data type. whenever I need to update an item of the list I just do the following:
SumInfo = (Summarized)SummarizedList[CurrentPos];
SumInfo.sDesire = DesireProd;
SumInfo.sReal = RealProduced;
SummarizedList[CurrentPos] = SumInfo;
where CurrentPos is the position of the item I want to update.
Everything works fine so far, so could be any problem with that in the future? Is this struct one of the mutable ones?
Thanks.
Everything works fine so far, so could be any problem with that in the future? Is this struct one of the mutable ones?
Yes, it's a mutable struct. Yes, it will work.
However, I would argue that this struct should be a class. For details as to why, refer to Choosing Between Classes and Structures. In this case, you're violating 3 of the cases which should always be true for structs:
It logically represents a single value, similar to primitive types (integer, double, and so on).
It has an instance size smaller than 16 bytes.
It is immutable.
This would strongly suggest a class is more appropriate for your type. It would also simplify your code.
BTW - Side note, the cast is not required in your code above. You should be able to just write:
SumInfo = SummarizedList[CurrentPos];
Issues:
It's a mutable struct, and they're almost always a bad idea (search for "mutable structs evil" and you'll get loads of hits)
It's got public fields - therefore no encapsulation; no separation between the API of the type an its implementation
It's got public members which don't follow the normal .NET naming conventions
It doesn't logically represent a single value, as per the .NET design guidelines
It's larger than the 16 bytes recommended by the same guidelines (although I wouldn't pay too much attention to that if everything else were okay)
Basically it's a dumb data bucket. There's a time and place for that, but it should almost always be a class in that case, and I'd personally usually try to make it an immutable type as well.
is there any reason you're using a struct? if you made it a class, the List would just contain references, and your code would look like:
SumInfo = SummarizedList[CurrentPos];
SumInfo.sDesire = DesireProd;
SumInfo.sReal = RealProduced;
// you're done! no need to insert it back in, you're referring to the same item
Personally, I would have nothing against using this struct. It may depend more on how you use it, whether you encapsulate the List methods etc.
The mutability of it depends on whether you are expecting to update any antries once you have added them to the list. If you are not expecting to, then your STRUCT is immutable, but your list isn't. However in this case you are updating the entries, so it is.
I would concur that a class is probably a better option for this.
Issue 1:
Yes the struct is mutable. And it suffers from all the problems associated with that.
SummarizedList[CurrentPos].sDesire=DesireProd;
shows why using a mutable struct like this is a bad idea, since it will not change the struct in the list.
Issue 2:
You have public mutable fields. This is considered bad style and you should use properties instead.
Issue 3:
You're using (System) Hungarian notation. When developing in C# follow the .net naming conventions.
1) I’m aware of the following benefits:
they increase the level of abstraction since you immediately see what underlying integral values represent.
You can use them instead of magic numbers and by doing that making the code more understandable
They also restrict the values an enum variable can have and in doing so make the application safer, since programmers know which values are valid for variable, so I guess they sort of provide a type safety
Are there any other benefits they provide over directly using integral values?
2) Why do they use integrals as an underlying type and not string?
thank you
You've listed a lot of the core reasons where enums are preferable to integral types.
Named constants are safer and more readable than magic numbers
Enums describe to programmers what they are for. Integral values don't.
Naturally limiting the set of values that can be passed in. (You've got the tip of the type-safety iceberg... but look deeper...)
You can also add:
Vastly increased Type Safety. If you accept an 'int', then any int can be passed in. If you accept a VehicleType, then only a VehicleType can be passed in. I'm not just talking about someone passing in 6 when the largest allowed number is 5. I mean what if you pass in FuelType.Unleaded to a function that thinks it means VehicleType.Aeroplane? With enums the compiler will tell you you're an idiot. An integral type says "yeah, 5 is fine with me" and your program exhibits really odd behaviour that may be extremely difficult to trace.
Easier refactoring. Just as with any magic constants, If you pass in the value 5 in a hundred places in your program, you're in trouble if you decide to change 5 to have a different meaning. With an enum (as long as you don't have binary backwards compatibility concerns) you can change the underlying values. You can also change the underlying type of an enum if you wish (byte -> int -> long) without having to do anything more than recompile the client code.
Bitfields are so much easier to work with when the bits and masks can be named. And if you add new bits, you can often arrange things so that merely updating the related masks will allow most of your existing code to handle the new bitfields perfectly without having to rewrite them from scratch.
Consistency throughout the program. If you are careful with obfuscation and type safety, enums allow you to represent a list of named values that a user chooses from with the same names in the code, but without the efficiency cost of using strings.
Everybody understands why constants are great in code. Enums simply give you a way of holding together a related group of constants. You could achieve the same thing in a messier manner using a namespace of consts.
Using an enum for a parameter rather than a bool not only makes the code self-documenting, readable, and less prone to mistakes. It also makes it much easier to add a third option when you realize that two options isn't enough.
As with all tools, enums can be misused. Just use them where they make sense.
2) Why use bytes or ints instead of strings? Simply they're small and efficient.
I would conjecture that they require underlying integral types to ensure simplicity of comparison and more easily support bit flags. Without that limitation, we, or the compiler, or the runtime, would likely have to resort to some fuzziness to do things like combinations - or we would get into a situation where - as you say - we shouldn't care about the underlying type (the point of the abstraction) and yet when we try to say A | B we get a runtime error because we used an underlying type that isn't capable of that type of operation.
One benefit is when you want to use enum as a flag.
So if you define an enum like this:
[Flags]
public enum TestEnum{ A, B, C, D };
Then if you have a method that accept an instance of TestEnum as a variable, you can combine the values from the enum, so you can send for example A | B | C as the parameter for the method. Then, inside the method, you can check the parameter like this:
if ((paramname & TestEnum.A) > 0)
{
//do things related to A
}
if ((paramname & TestEnum.B) > 0)
{
//do things related to B
}
//same for C and D
Also, I think the reasons you mention are good enough by themselves to use enums.
Also regarding the comment that you can force an wrong value into an enum with code like this (TestEnum)500; it's hard to do if you do not want to break your code.
The point that the value 0 for an enum should be the default value, or in the case of flags "the absence of all other flags" is very important, since the line TestEnum myenum will instanciate myenum as 0 regardless if you have defined any enum value for 0 or not.
You can also parse an Enum from the string representation. You may get that string from a data source or user-entry.
I think you sold me on Enums at "magic numbers".
The main benefit of enum is that constants can be referred to in a consistent, expressive and type safe way.
Readability is of-course the topmost advantage of using the enumeration.
Another advantage is that enumerated constants are generated automatically by the compiler.
For instance, if you had an enumerated constant type for error codes that could occur in your program, your enum definition could look something like this:
enum Error_Code
{
OUT_OF_MEMORY,
FILE_NOT_FOUND
};
OUT_OF_MEMORY is automatically assigned the value of 0 (zero) by the compiler
because it appears first in the definition.FILE_NOT_FOUND equal to 1, so on.
If you were to approach the same example by using symbolic constants or Magic numbers, you write much more code to do the same.
In the "C# Coding Standard" by Juval Lowy available from www.idesign.net, the recomendation is made to use the C# predefined types instead of the aliases in the System namespace, e.g.:
object NOT Object
string NOT String
int NOT Int32
What is the benefit of this? How do they differ? I have followed this advise in my own coding but never knew how they differed.
The main time they are unexpectedly different is when someone is stupid enough to call a type (or property /field/etc) String (for example), since string always refers to global::System.String, where-as String could be YourNamespace.String.
The closest you can get to the C# alias is #string, which tends to stick out like a sore thumb.
I prefer the C# aliases.
btw, here's a fun way to mess with anyone using dynamic too much:
using dynamic = System.Object;
They don't really differ. Personally I use the aliases too, but Jeff Richter advocates the exact opposite. The generated code will be exactly the same. Use whichever you find most readable (and try to be consistent).
One thing most people agree on: when writing an API, use the type name rather than the alias, so:
int ReadInt32()
rather than
int ReadInt()
the int part doesn't matter here - it's not part of the name, and can be displayed appropriately to any consumer using any language... but the method name should be language-neutral, which means using the type name.
One place where you have to use the alias is when specifying the underlying type for an enum:
enum Foo : byte // Valid
enum Foo : System.Byte // Invalid
In addition to what Jon said here is another difference.
var x = (Int32)-y; // Does not compile.
var x = (int)-y; // Negates the value of y and casts as an int.
This is because of a grammar disambiguation rule defined in §7.6.6 of the C# Programming Language specification.
I think using the 'blue' int, string, etc.. might be a little more intuitive to read. Otherwise, I use the class when calling a static method on it i.e. Int32.TryParse()
I always use the aliases when specifying the type in a parameter, property or method signature or field (so: almost everywhere) except when calling a static member on such a type.
String.Format("{0}", 1);
Int32.Parse("123");
String.IsNullOrEmpty(value);
Here's another compiler-based difference:
public enum MyEnum : Byte {Value1, Value2} //does not compile
public enum MyEnum : byte {Value1, Value2} //ok
The only difference is that they're nicer to read (this of course is a matter of opinion). The compiled result is exactly the same bytecode.
The Entity Framework code generator uses predefined types, so if you want to be able to implement the Visual Studio 2017 coding style rules fully you will need to choose predefined types (int instead of Int32, etc). Otherwise your generated code will not be in compliance.
(Options->Text Editor->C#->Code Style->General->predefined type preference)
I've been writing C# for seven years now, and I keep wondering, why do enums have to be of an integral type? Wouldn't it be nice to do something like:
enum ErrorMessage
{
NotFound: "Could not find",
BadRequest: "Malformed request"
}
Is this a language design choice, or are there fundamental incompatibilities on a compiler, CLR, or IL level?
Do other languages have enums with string or complex (i.e. object) types? What languages?
(I'm aware of workarounds; my question is, why are they needed?)
EDIT: "workarounds" = attributes or static classes with consts :)
The purpose of an Enum is to give more meaningful values to integers. You're looking for something else besides an Enum. Enums are compatible with older windows APIs and COM stuff, and a long history on other platforms besides.
Maybe you'd be satisfied with public const members of a struct or a class.
Or maybe you're trying to restrict some specialized types values to only certain string values? But how it's stored and how it's displayed can be two different things - why use more space than necessary to store a value?
And if you want to have something like that readable in some persisted format, just make a utility or Extension method to spit it out.
This response is a little messy because there are just so many reasons. Comparing two strings for validity is much more expensive than comparing two integers. Comparing literal strings to known enums for static type-checking would be kinda unreasonable. Localization would be ... weird. Compatibility with would be broken. Enums as flags would be meaningless/broken.
It's an Enum. That's what Enums do! They're integral!
Perhaps use the description attribute from System.ComponentModel and write a helper function to retrieve the associated string from an enum value? (I've seen this in a codebase I work with and seemed like a perfectly reasonable alternative)
enum ErrorMessage
{
[Description("Could not find")]
NotFound,
[Description("Malformed request")]
BadRequest
}
What are the advantages, because I can only see drawbacks:
ToString will return a different string to the name of the enumeration. That is, ErrorMessage.NotFound.ToString() will be "Could not find" instead of "NotFound".
Conversely, with Enum.Parse, what would it do? Would it still accept the string name of the enumeration as it does for integer enumerations, or does it work with the string value?
You would not be able to implement [Flags] because what would ErrorMessage.NotFound | ErrorMessage.BadRequest equal in your example (I know that it doesn't really make sense in this particular case, and I suppose you could just say that [Flags] is not allowed on string-based enumerations but that still seems like a drawback to me)
While the comparison errMsg == ErrorMessage.NotFound could be implemented as a simple reference comparison, errMsg == "Could not find" would need to be implemented as a string comparison.
I can't think of any benefits, especially since it's so easy to build up your own dictionary mapping enumeration values to "custom" strings.
The real answer why: There's never been a compelling reason to make enums any more complicated than they are. If you need a simple closed list of values - they're it.
In .Net, enums were given the added benefit of internal representation <-> the string used to define them. This one little change adds some versioning downsides, but improves upon enums in C++.
The enum keyword is used to declare an
enumeration, a distinct type that
consists of a set of named constants
called the enumerator list.
Ref: msdn
Your question is with the chosen storage mechanism, an integer. This is just an implementation detail. We only get to peek beneath the covers of this simple type in order to maintain binary compatibility. Enums would otherwise have very limited usefulness.
Q: So why do enums use integer storage? As others have pointed out:
Integers are quick and easy to compare.
Integers are quick and easy to combine (bitwise for [Flags] style enums)
With integers, it's trivially easy to implement enums.
* none of these are specific to .net, and it appears the CLR designers apparently didn't feel compelled to change anything or add any gold plating to them.
Now that's not to saying your syntax isn't entirely unappealing. But is the effort to implement this feature in the CLR, and all the compilers, justified? For all the work that goes into this, has it really bought you anything you couldn't already achieve (with classes)? My gut feeling is no, there's no real benefit. (There's a post by Eric Lippert I wanted to link to, but I couldn't find it)
You can write 10 lines of code to implement in user-space what you're trying to achieve without all the headache of changing a compiler. Your user-space code is easily maintained over time - although perhaps not quite as pretty as if it's built-in, but at the end of the day it's the same thing. You can even get fancy with a T4 code generation template if you need to maintain many of your custom enum-esque values in your project.
So, enums are as complicated as they need to be.
Not really answering your question but presenting alternatives to string enums.
public struct ErrorMessage
{
public const string NotFound="Could not find";
public const string BadRequest="Malformed request";
}
Perhaps because then this wouldn't make sense:
enum ErrorMessage: string
{
NotFound,
BadRequest
}
It's a language decision - eg., Java's enum doesn't directly correspond to an int, but is instead an actual class. There's a lot of nice tricks that an int enum gives you - you can bitwise them for flags, iterate them (by adding or subtracting 1), etc. But, there's some downsides to it as well - the lack of additional metadata, casting any int to an invalid value, etc.
I think the decision was probably made, as with most design decisions, because int enums are "good enough". If you need something more complex, a class is cheap and easy enough to build.
Static readonly members give you the effect of complex enums, but don't incur the overhead unless you need it.
static class ErrorMessage {
public string Description { get; private set; }
public int Ordinal { get; private set; }
private ComplexEnum() { }
public static readonly NotFound = new ErrorMessage() {
Ordinal = 0, Description = "Could not find"
};
public static readonly BadRequest = new ErrorMessage() {
Ordinal = 1, Description = "Malformed Request"
};
}
Strictly speaking, the intrinsic representation of an enum shouldn't matter, because by definition, they are enumerated types. What this means is that
public enum PrimaryColor { Red, Blue, Yellow }
represents a set of values.
Firstly, some sets are smaller, whereas other sets are larger. Therefore, the .NET CLR allows one to base an enum on an integral type, so that the domain size for enumerated values can be increased or decreased, i.e., if an enum was based on a byte, then that enum cannot contain more than 256 distinct values, whereas one based on a long can contain 2^64 distinct values. This is enabled by the fact that a long is 8 times larger than a byte.
Secondly, an added benefit of restricting the base type of enums to integral values is that one can perform bitwise operations on enum values, as well as create bitmaps of them to represent more than one values.
Finally, integral types are the most efficient data types available inside a computer, therefore, there is a performance advantage when it comes to comparing different enum values.
For the most part, I would say representing enums by integral types seems to be a CLR and/or CLS design choice, though one that is probably not very difficult to arrive at.
The main advantage of integral enums is that they don't take up much space in memory. An instance of a default System.Int32-backed enum takes up just 4-bytes of memory and can be compared quickly to other instances of that enum.
In constrast, string-backed enums would be reference types that require each instance to be allocated on the heap and comparisons to involve checking each character in a string. You could probably minimize some of the issues with some creativity in the runtime and with compilers, but you'd still run into similar problems when trying to store the enum efficiently in a database or other external store.
While it also counts as an "alternative", you can still do better than just a bunch of consts:
struct ErrorMessage
{
public static readonly ErrorMessage NotFound =
new ErrorMessage("Could not find");
public static readonly ErrorMessage BadRequest =
new ErrorMessage("Bad request");
private string s;
private ErrorMessage(string s)
{
this.s = s;
}
public static explicit operator ErrorMessage(string s)
{
return new ErrorMessage(s);
}
public static explicit operator string(ErrorMessage em)
{
return em.s;
}
}
The only catch here is that, as any value type, this one has a default value, which will have s==null. But this isn't really different from Java enums, which themselves can be null (being reference types).
In general, Java-like advanced enums cross the line between actual enums, and syntactic sugar for a sealed class hierarchy. Whether such sugar is a good idea or not is arguable.
So in C++, I'm used to being able to do:
typedef int PeerId;
This allows me to make a type more self-documenting, but additionally also allows me to make PeerId represent a different type at any time without changing all of the code. I could even turn PeerId into a class if I wanted. This kind of extensibility is what I want to have in C#, however I am having trouble figuring out how to create an alias for 'int' in C#.
I think I can use the using statement, but it only has scope in the current file I believe, so that won't work (The alias needs to be accessible between multiple files without being redefined). I also can't derive a class from built-in types (but normally this is what I would do to alias ref-types, such as List or Dictionary). I'm not sure what I can do. Any ideas?
You need to use the full type name like this:
using DWORD = System.Int32;
You could (ab)use implicit conversions:
struct PeerId
{
private int peer;
public static implicit operator PeerId(int i)
{
return new PeerId {peer=i};
}
public static implicit operator int(PeerId p)
{
return p.peer;
}
}
This takes the same space as an int, and you can do:
PeerId p = 3;
int i = p;
But I agree you probably don't need this.
Summary
Here's the short answer:
Typedefs are actually a variable used by compile-time code generators.
C# is being designed to avoid adding code generation language constructs.
Therefore, the concept of typedefs doesn't fit in well with the C# language.
Long Answer
In C++, it makes more sense: C++ started off as a precompiler that spit out C code, which was then compiled. This "code generator" beginning still has effects in modern C++ features (i.e., templates are essentially a Turing-complete language for generating classes and functions at compile time). In this context, a typedef makes sense because it's a way to get the "result" of a compile-time type factory or "algorithm" that "returns" a type.
In this strange meta-language (which few outside of Boost have mastered), a typedef is actually a variable.
What you're describing is less complex, but you're still trying to use the typedef as a variable. In this case, it's used as an input variable. So when other code uses the typedef, it's really not using that type directly. Rather, it's acting as a compile-time code generator, building classes and methods based on typedef'ed input variables. Even if you ignore C++ templates and just look at C typedefs, the effect is the same.
C++ and Generative Programming
C++ was designed to be a multi-paradign language (OO and procedural, but not functional until Boost came out). Interestingly enough, templates have evolved an unexpected paradign: generative programming. (Generative programming was around before C++, but C++ made it popular). Generative programs are actually meta-programs that - when compiled - generate the needed classes and methods, which are in turn compiled into executables.
C# and Generative Programming
Our tools are slowly evolving in the same direction. Of course, reflection emit can be used for "manual" generative programming, but it is quite painful. The way LINQ providers use expression trees is very generative in nature. T4 templates get really close but still fall short. The "compiler as a service" which will hopefully be part of C# vNext appears most promising of all, if it could be combined with some kind of type variable (such as a typedef).
This one piece of the puzzle is still missing: generative programs need some sort of automatic trigger mechanism (in C++, this is handled by implicit template instantiation).
However, it is explicitly not a goal of C# to have any kind of "code generator" in the C# language like C++ templates (probably for the sake of understandability; very few C++ programmers understand C++ templates). This will probably be a niche satisfied by T4 rather than C#.
Conclusion (repeating the Summary)
All of the above is to say this:
Typedefs are a variable used by code generators.
C# is being designed to avoid adding code generation language constructs.
Therefore, the concept of typedefs doesn't fit in well with the C# language.
I also sometimes feel I need (integer) typedefs for similar purposes to the OP.
If you do not mind the casts being explicit (I actually want them to be) you can do this:
enum PeerId : int {};
Will also work for byte, sbyte, short, ushort, uint, long, or ulong (obviously).
Not exactly the intended usage of enum, but it does work.
Since C# 10 you can use global using:
global using PeerId = System.Int32;
It works for all files.
It should appear before all using directives without the global modifier.
See using directive.
Redefining fundamental types just for the sake of changing the name is C++ think and does not sit well with the more pure Object Orientated C#. Whenever you get the urge to shoehorn a concept from one language into another, you must stop and think whether or not it makes sense and try to stay native to the platform.
The requirement of being able to change the underlying type easily can be satisfied by defining your own value type. Coupled with implicit conversion operators and arithmetic operators, you have the power to define very powerful types. If you are worried about performance for adding layers on top of simple types, don't. 99% chance that it won't, and the 1% chance is that in case it does, it will not the be "low hanging fruit" of performance optimization.