Is it possible to implement IEquatable from a struct, but pass the compared argument by reference?
Example:
struct BigToCopy : IEquatable<BigToCopy>
{
int x, y, z, w, foo bar, etc;
bool Equals(in BigToCopy other) // does not compile with 'in' parameter
{
}
}
It does not compile, because seemingly it does not implement IEquatable<BigToCopy> when there is the 'in' keyword.
This one is ok, but it copies by value:
struct BigToCopy : IEquatable<BigToCopy>
{
int x, y, z, w, foo bar, etc;
bool Equals(BigToCopy other) // does compile
{
}
}
IEqualityComparer has the same problem as far as I can tell. It requires the arguments to be passed by value.
Would compiler optimizations avoid copying even if we pass by value? Note: I'm on Unity. Using mono and il2cpp.
Edit: ok, let's be more concrete. We have a number of classes in our code base that are implemented like this:
struct Vertex
{
public Point3d Position; // 3 doubles
public Vector3d Normal; // 3 doubles
public Vector2d UV; // 2 doubles
public Color Color; // 4 bytes
// possibly some other vertex data
public override bool Equals(object obj) { ... }
public bool Equals(in Vertex vertex) { ... }
}
There are multiple such classes. They are put in collections as HashSets, Dictionaries, etc. They are also being compared by explicit calls of Equals in some cases. These objects can be created in thousands, maybe even millions, and are processed in some way. They are usually in a hot code path.
Recently, I have found out that dictionary lookup using these objects can lead to object allocation. The reason: these objects don't implement IEquatable. Therefore, the dictionary calls Equals(object obj) overload instead, which leads to boxing (= memory allocation).
I'm fixing these classes right now, by implementing the IEquatable interface and removing the 'in' keyword. But there is some existing code that calls these Equals methods directly, and I was not sure whether I was affecting the performance badly. I could try, but there are many classes I'm fixing this way, so it is too much work to check.
Instead, I add another method:
public bool EqualsPassByRef(in Vertex vertex) { ... }
and replace explicit calls of Equals by EqualsPassByRef.
It works ok this way. The performance would be better compared to before. I just wondered: maybe there is a way to make C# call the 'in' version from the dictionary. Then the 'EqualsPassByRef' versions would not be needed, making the code look better (and possibly also faster in dictionary lookup). From the answers I conclude that it is not possible. Ok, that's still fine.
Btw: I'm new to C#. I'm coming from a C++ world.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I've just started to program in C# last year and I'm still learning the language. I have a question regarding readonly struct type and equality comparison methods.
When creating a struct in C#, I know it's usually considered to be a best practice to implement IEquatable as the default reflection-based comparison is very slow. I also learned that in C# 7.2 and later we can define readonly structs and for these types, we can also use in parameter to avoid unnecessary copying.
As structs are usually defined as immutable readonly types, I suppose it's not unusual to define Equals methods for read only structs.
Given the above facts, however, what I'm wondering is if there is a good effective way to implement equality comparison methods for them. My point is that none of these equality methods and operators actually need to modify parameters so I want to utilize in parameters somehow to save unnecessary copying.
Below is my attempt at doing this:
public readonly struct Point : IEquatable<Point>
{
public int X { get; }
public int Y { get; }
public Point(int x, int y)
{
X = x;
Y = y;
}
// Explicitly implementing IEquatable<Point> and delegating to an Equals method taking in param.
bool IEquatable<Point>.Equals(Point other) => Equals(other);
public bool Equals(in Point other) => X == other.X && Y == other.Y;
public override bool Equals(object? obj) => obj is Point other && Equals(other);
public static bool operator ==(in Point left, in Point right) => left.Equals(right);
public static bool operator !=(in Point left, in Point right) => !left.Equals(right);
public override int GetHashCode() => HashCode.Combine(X, Y);
public override string ToString() => $"Point({X}, {Y})";
}
The above certainly works but I think it's not perfect solution as copying is still needed if it's called through the IEquatable interface. Note that I can't just implement IEquatable implicitly with in param as Equal method taking in modifier is considered to have different signature and treated as an overload.
Is there a known best practice to implement this correctly?
What I would really like to know is if there is a known best practices and patterns to effectively implement equality for such readonly structs. Especially I'm interested in a way to properly utilize in parameter modifier for implementing Equality comparison methods.
So far I haven't found any satisfactory answer on the Web, and I also checked some source codes of the core library. For example, System.DateTime is now defined as readonly struct and it's rather big, but in parameter isn't utilized t here. (Existing type might well need to keep compatibility, I know they often need to have compromise, though.)
Note that the Point struct defined above is small composing only two 32 bit slot, so copying might not actually be a big problem here, but it's just meant for a simple illustrative example.
Update for .NET6(C# 10)
Now that C#10 has officially been released, the original question has become almost obsolete. We can now create such a read only struct as an readonly record struct. Of course depending of your model, it can also defined as a normal refence type record (class).
https://devblogs.microsoft.com/dotnet/welcome-to-csharp-10/
The point is that optimized equality comparison methods and operators are automatically generated for record types.
public readonly record struct Person
{
public string FirstName { get; init; }
public string LastName { get; init; }
}
The above is fine, however: note that the advantages of in on read-only value-types only really kicks in for types that are non-trivial in size; in the case of two integers, you're probably over-thinking things.
You can't remove the need to use pass-by-value on the IEquatable<T> scenario, since that is how the API is defined.
It may also be worth noting that the in usage may make it hard to consume this API from languages other than C#; VB has poor support for this, as an example. Whether this is important depends on your target audience.
UPDATE: the next version of C# has a feature under consideration that would directly answer this issue. c.f. answers below.
Requirements:
App data is stored in arrays-of-structs. There is one AoS for each type of data in the app (e.g. one for MyStruct1, another for MyStruct2, etc)
The structs are created at runtime; the more code we write in the app, the more there will be.
I need one class to hold references to ALL the AoS's, and allow me to set and get individual structs within those AoS's
The AoS's tend to be large (1,000's of structs per array); copying those AoS's around would be a total fail - they should never be copied! (they never need to!)
I have code that compiles and runs, and it works ... but is C# silently copying the AoS's under the hood every time I access them? (see below for full source)
public Dictionary<System.Type, System.Array> structArraysByType;
public void registerStruct<T>()
{
System.Type newType = typeof(T);
if( ! structArraysByType.ContainsKey(newType ) )
{
structArraysByType.Add(newType, new T[1000] ); // allowing up to 1k
}
}
public T get<T>( int index )
{
return ((T[])structArraysByType[typeof(T)])[index];
}
public void set<T>( int index, T newValue )
{
((T[])structArraysByType[typeof(T)])[index] = newValue;
}
Notes:
I need to ensure C# sees this as an array of value-types, instead of an array of objects ("don't you DARE go making an array of boxed objects around my structs!"). As I understand it: Generic T[] ensures that (as expected)
I couldn't figure out how to express the type "this will be an array of structs, but I can't tell you which structs at compile time" other than System.Array. System.Array works -- but maybe there are alternatives?
In order to index the resulting array, I have to typecast back to T[]. I am scared that this typecast MIGHT be boxing the Array-of-Structs; I know that if it were (T) instead of (T[]), it would definitely box; hopefully it doesn't do that with T[] ?
Alternatively, I can use the System.Array methods, which definitely boxes the incoming and outgoing struct. This is a fairly major problem (although I could workaround it if were the only way to make C# work with Array-of-struct)
As far as I can see, what you are doing should work fine, but yes it will return a copy of a struct T instance when you call Get, and perform a replacement using a stack based instance when you call Set. Unless your structs are huge, this should not be a problem.
If they are huge and you want to
Read (some) properties of one of a struct instance in your array without creating a copy of it.
Update some of it's fields (and your structs are not supposed to be immutable, which is generally a bad idea, but there are good reasons for doing it)
then you can add the following to your class:
public delegate void Accessor<T>(ref T item) where T : struct;
public delegate TResult Projector<T, TResult>(ref T item) where T : struct;
public void Access<T>(int index, Accessor<T> accessor)
{
var array = (T[])structArraysByType[typeof(T)];
accessor(ref array[index]);
}
public TResult Project<T, TResult>(int index, Projector<T, TResult> projector)
{
var array = (T[])structArraysByType[typeof(T)];
return projector(ref array[index]);
}
Or simply return a reference to the underlying array itself, if you don't need to abstract it / hide the fact that your class encapsulates them:
public T[] GetArray<T>()
{
return (T[])structArraysByType[typeof(T)];
}
From which you can then simply access the elements:
var myThingsArray = MyStructArraysType.GetArray<MyThing>();
var someFieldValue = myThingsArray[10].SomeField;
myThingsArray[3].AnotherField = "Hello";
Alternatively, if there is no specific reason for them to be structs (i.e. to ensure sequential cache friendly fast access), you might want to simply use classes.
There is a much better solution that is planned for adding to next version of C#, but does not yet exist in C# - the "return ref" feature of .NET already exists, but isn't supported by the C# compiler.
Here's the Issue for tracking that feature: https://github.com/dotnet/roslyn/issues/118
With that, the entire problem becomes trivial "return ref the result".
(answer added for future, when the existing answer will become outdated (I hope), and because there's still time to comment on that proposal / add to it / improve it!)
This question already has answers here:
Is there a constraint that restricts my generic method to numeric types?
(24 answers)
Closed 8 years ago.
Trying to create a generic lerp function in C#. My function currently looks like the one below.
public T Lerp<T>(T lhs, T rhs, float t)
{
return (1.0f - t) * lhs + t * rhs;
}
The question referenced as a duplicate constrains they type to built in types. I don't want to return a constraint I want to return an object of type T(Generic Object?) in C#.
To do this requires a couple steps. We'll remove the ref keywords because they don't help in this case. Let's assume you wanted to LERP vectors. Let's also assume you have a set of vector objects called Vector2, Vector3, and Vector4 (for a 2 dimensional vector, 3d, and 4d respectively). To make the Lerp function work with all three of them, you have to adjust those objects first.
Create a base class for all three that performs the multiplication between a vector and a floating point number. The base class calls an abstract method that performs the actual math in the specific vector objects. The base class would look something like this:
public abstract BaseVector
{
public abstract BaseVector Multiply(float multiplier);
public static BaseVector operator *(BaseVector v, float multiplier)
{
return v.Multiply(multiplier);
}
// Add the operator * overload where the multiplier is first
}
Now, for the function itself, it's a matter of providing a where clause to make use of the behavior of the base class. If your Vector2, Vector3, and Vector4 classes all extend BaseVector and implement the Multiply() method, then this will work:
public T Lerp<T>(T lhs, T rhs, float t)
where T : BaseVector
{
return (1.0f - t) * lhs + t * rhs;
}
It's not going to be able to get more generic than that because operator overloads are static extension methods that have to be declared in the class that they apply to. You would be able to use an interface and then any object that implements that interface would be able to be used, but you would have to call the Multiply method declared in the interface directly. In that case, you would change the where clause to have T implement the interface. It's the same syntax, just that the type changes from BaseVector to the interface name.
You won't be able to call method names on the object that don't exist in the base Object type unless you specify the where clause.
Answer from programmers.stackexchange.com:
If your intent is to figure out how to implement the LERP function with any passed in type, then your question is probably more appropriate for Stackoverflow.com.
Since linear interpolation is usually implemented with floating point numbers, it's probably most clean not to use a generic for this case. The result is a floating point number:
// Precise method which guarantees v = v1 when t = 1.
public static float Lerp(float v0, float v1, float t)
{
return (1-t)*v0 + t*v1;
}
You might increase the precision if you think it would be useful by changing all the types to a double. This keeps a simple function operating simply. The problem you'll run into is that operator overloading is static, so if you want to create a function that operates on a number of objects (for example vectors) you'll have to create a base class that all those objects extend which defines the operator* overload and provide a where clause to restrict T to something that extends that base class. That can get complicated.
The point of this answer is to get you to think about why you feel you:
Need the function to be generic
Need the ref keyword
Both are complicating a pretty simple function. The ref keyword is unnecessary because you are only reading the parameters. The generic is unnecessary when you are working with numerical values because they all have implicit conversion operators when you are increasing precision. They also all have explicit conversion operators when you are decreasing precision. In short, unless you are working with a complex type, keep the method very simple and only break out the more complicated tricks when you really need it.
I'm toying with the idea of making primitive .NET value types more type-safe and more "self-documenting" by wrapping them in custom structs. However, I'm wondering if it's actually ever worth the effort in real-world software.
(That "effort" can be seen below: Having to apply the same code pattern again and again. We're declaring structs and so cannot use inheritance to remove code repetition; and since the overloaded operators must be declared static, they have to be defined for each type separately.)
Take this (admittedly trivial) example:
struct Area
{
public static implicit operator Area(double x) { return new Area(x); }
public static implicit operator double(Area area) { return area.x; }
private Area(double x) { this.x = x; }
private readonly double x;
}
struct Length
{
public static implicit operator Length(double x) { return new Length(x); }
public static implicit operator double(Length length) { return length.x; }
private Length(double x) { this.x = x; }
private readonly double x;
}
Both Area and Length are basically a double, but augment it with a specific meaning. If you defined a method such as…
Area CalculateAreaOfRectangleWith(Length width, Length height)
…it would not be possible to directly pass in an Area by accident. So far so good.
BUT: You can easily sidestep this apparently improved type safety simply by casting a Area to double, or by temporarily storing an Area in a double variable, and then passing that into the method where a Length is expected:
Area a = 10.0;
double aWithEvilPowers = a;
… = CalculateAreaOfRectangleWith( (double)a, aWithEvilPowers );
Question: Does anyone here have experience with extensive use of such custom struct types in real-world / production software? If so:
Has the wrapping of primitive value types in custom structs ever directly resulted in less bugs, or in more maintainable code, or given any other major advantage(s)?
Or are the benefits of custom structs too small for them to be used in practice?
P.S.: About 5 years have passed since I asked this question. I'm posting some of my experiences that I've made since then as a separate answer.
I did this in a project a couple of years ago, with some mixed results. In my case, it helped a lot to keep track of different kinds of IDs, and to have compile-time errors when the wrong type of IDs were being used. And I can recall a number of occasions where it prevented actual bugs-in-the-making. So, that was the plus side. On the negative side, it was not very intuitive for other developers -- this kind of approach is not very common, and I think other developers got confused with all these new types springing up. Also, I seem to recall we had some problems with serialization, but I can't remember the details (sorry).
So if you are going to go this route, I would recommend a couple of things:
1) Make sure you talk with the other folks on your team first, explain what you're trying to accomplish and see if you can get "buy-in" from everyone. If people don't understand the value, you're going to be constantly fighting against the mentality of "what's the point of all this extra code"?
2) Consider generate your boilerplate code using a tool like T4. It will make the maintenance of the code much easier. In my case, we had about a dozen of these types and going the code-generation route made changes much easier and much less error prone.
That's my experience. Good luck!
John
This is a logical next step to Hungarian Notation, see an excellent article from Joel here http://www.joelonsoftware.com/articles/Wrong.html
We did something similar in one project/API where esp. other developers needed to use some of our interfaces and it had siginificantly less "false alarm support cases" - because it made really obvious what was allowed/needed... I would suspect this means measurably less bugs though we never did the statistics because the resulting apps were not ours...
In the aprox. 5 years since I've asked this question, I have often toyed with defining struct value types, but rarely done it. Here's some reasons why:
It's not so much that their benefit is too small, but that the cost of defining a struct value type is too high. There's lots of boilerplate code involved:
Override Equals and implement IEquatable<T>, and override GetHashCode too;
Implement operators == and !=;
Possibly implement IComparable<T> and operators <, <=, > and >= if a type supports ordering;
Override ToString and implement conversion methods and type-cast operators feom/to other related types.
This is simply a lot of repetitive work that is often not necessary when using the underlying primitive types directly.
Often, there is no reasonable default value (e.g. for value types such as zip codes, dates, times, etc.). Since one can't prohibit the default constructor, defining such types as struct is a bad idea and defining them as class means some runtime overhead (more work for the GC, more memory dereferencing, etc.)
I haven't actually found that wrapping a primitive type in a semantic struct really offers significant benefits with regard to type safety; IntelliSense / code completion and good choices of variable / parameter names can achieve much of the same benefits as specialized value types. On the other hand, as another answer suggests, custom value types can be unintuitive for some developers, so there's an additional cognitive overhead in using them.
There have been some instances where I ended up wrapping a primitive type in a struct, normally when there are certain operations defined on them (e.g. a DecimalDegrees and Radians type with methods to convert between these).
Defining equality and comparison methods, on the other hand, does not necessarily mean that I'd define a custom value type. I might instead use primitive .NET types and provide well-named implementations of IEqualityComparer<T> and IComparer<T> instead.
I don't often use structs.
One thing you may consider is to make your type require a dimension. Consider this:
public enum length_dim { meters, inches }
public enum area_dim { square_meters, square_inches }
public class Area {
public Area(double a,area_dim dim) { this.area=a; this.dim=dim; }
public Area(Area a) { this.area = a.area; this.dim = a.dim; }
public Area(Length l1, Length l2)
{
Debug.Assert(l1.Dim == l2.Dim);
this.area = l1.Distance * l1.Distance;
switch(l1.Dim) {
case length_dim.meters: this.dim = square_meters;break;
case length_dim.inches: this.dim = square_inches; break;
}
}
private double area;
public double Area { get { return this.area; } }
private area_dim dim;
public area_dim Dim { get { return this.dim; } }
}
public class Length {
public Length(double dist,length_dim dim)
{ this.distance = dist; this.dim = dim; }
private length_dim dim;
public length_dim Dim { get { return this.dim; } }
private double distance;
public double Distance { get { return this.distance; } }
}
Notice that nothing can be created from a double alone. The dimension must be specified. My objects are immutable. Requiring dimension and verifying it would have prevented the Mars Express disaster.
I can't say, from my experience, if this is a good idea or not. It certainly has it's pros and cons. A pro is that you get an extra dose of type safety. Why accept a double for an angle when you can accept a type that has true angle semantics (degrees to/from radians, constrained to degree values of 0-360, etc). Con, not common so could be confusing to some developers.
See this link for a real example of a real commercial product that has several types like you describe:
http://geoframework.codeplex.com/
Types include Angle, Latitude, Longitude, Speed, Distance.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Structs, Interfaces and Boxing
From the MSDN: http://msdn.microsoft.com/en-us/library/yz2be5wk.aspx
Boxing is the process of converting a value type to the type object or to any interface type implemented by this value type.
But what about generic interfaces?
For example, int derives from both IComparable and IComparable<int>.
Let's say I have the following code:
void foo(IComparable value) { /* etc. */ }
void bar(IComparable<T> value) { /* etc. */ }
void gizmo()
{
int i = 42;
bar(i); // is `i` boxed? I'd say YES
foo(i); // is `i` boxed? I fear it is (but I hope for NO)
}
Does bar (or any function taking a non-generic interface) means there will be boxing?
Does foo (or any function taking a generic interface on the type) means there will be boxing?
Thanks.
Any time a struct is cast to an interface, it is boxed. The purpose of IComparable<T> is to allow for something like:
void bar<T>(T value) where T : IComparable<T> { /* etc. */ }
When used in that fashion, the struct will be passed as a struct (via the generic type parameter) rather than as an interface, and thus will not have to be boxed. Note that depending upon the size of the struct, it may sometimes be better to pass by value and sometimes by reference, though of course if one is using an existing interface like IComparable one must pass as the interface demands.
First, a short (and probably incomplete) primer on value types, reference types, and boxing.
You can tell that something is a value type because changes made in a function do not persist outside the function. The value of the object is copied when the function is called, and thrown away at the end of that function.
You can tell that something is a reference type because changes made in a function persist outside the function. The value of the object is not copied when the function is called, and exists after the end of that function.
If something is boxed, a single copy is made, and seated within a reference type. It effectively changes from a value type to a reference type.
Note that this all applies to instanced state, i.e. any non-static member data. Static members are not instanced state, and have nothing to do with reference types, value types, or boxing. Methods and properties that don't use instanced state (for example, ones that use only local variables or static member data) will not operate differently differently on reference types, value types, or when boxing occurs.
Armed with that knowledge, here is how we can prove that boxing does occur when converting a struct to an interface (generic or not):
using System;
interface ISomeInterface<T>
{
void Foo();
T MyValue { get; }
}
struct SomeStruct : ISomeInterface<int>
{
public void Foo()
{
this.myValue++;
}
public int MyValue
{
get { return myValue; }
}
private int myValue;
}
class Program
{
static void SomeFunction(ISomeInterface<int> value)
{
value.Foo();
}
static void Main(string[] args)
{
SomeStruct test1 = new SomeStruct();
ISomeInterface<int> test2 = test1;
// Call with struct directly
SomeFunction(test1);
Console.WriteLine(test1.MyValue);
SomeFunction(test1);
Console.WriteLine(test1.MyValue);
// Call with struct converted to interface
SomeFunction(test2);
Console.WriteLine(test2.MyValue);
SomeFunction(test2);
Console.WriteLine(test2.MyValue);
}
}
The output looks like this:
0
0
1
2
This means that boxing occurs only when doing the conversion:
The first two calls do the boxing upon each call.
The second two calls already have a boxed copy, and boxing does not occur upon each call.
I won't bother duplicating all the code here, but if you change ISomeInterface<T> to ISomeInterface, you'll still have the same behavior.
Summary of answers
My confusion about generic interfaces and boxing/unboxing came from the fact I knew C# generics enabled us to produce more efficient code.
For example, the fact int implements IComparable<T> and IComparable meant to me:
IComparable was to be used with old, pre-generics code, but would mean boxing/unboxing
IComparable<T> was to be used to generics enabled code, supposedly avoiding boxing/unboxing
Eric Lippert's comment is as simple, clear and direct as it can be:
Generic interface types are interface types. There's nothing special about them that magically prevents boxing
From now, I know without doubt that casting a struct into an interface will imply boxing.
But then, how IComparable<T> was supposed to work more efficiently than IComparable?
This is where supercat's answer (edited by Lasse V. Karlsen) pointed me to the fact generics were more like C++ templates than I thought:
The purpose of IComparable is to allow for something like:
void bar<T>(T value) where T : IComparable<T> { /* etc. */ }
Which is quite different from:
void bar(IComparable<T> value) { /* etc. */ }
Or even:
void bar(IComparable value) { /* etc. */ }
My guess is that for the first prototype, the runtime will generate one function per type, and thus, avoid boxing issues when dealing with structs.
Whereas, for the second prototype, the runtime will only generate functions with an interface as a parameter, and as such, do boxing when T is a struct. The third function will just box the struct, no more, no less.
(I guess this is where C# generics combined with C# structs show their superiority when compared with Java type-erasure generics implementation.)
Merlyn Morgan-Graham's answer provided me with an example of test I'll play with at home. I'll complete this summary as soon as I have meaningful results (I guess I'll try to use pass-by-reference semantics to see how all that works...)