I'm looking at the new C# feature of tuples. I'm curious, what problem was the tuple designed to solve?
What have you used tuples for in your apps?
Update
Thanks for the answers thus far, let me see if I have things straight in my mind.
A good example of a tuple has been pointed out as coordinates. Does this look right?
var coords = Tuple.Create(geoLat,geoLong);
Then use the tuple like so:
var myLatlng = new google.maps.LatLng("+ coords.Item1 + ", "+ coords.Item2 + ");
Is that correct?
When writing programs it is extremely common to want to logically group together a set of values which do not have sufficient commonality to justify making a class.
Many programming languages allow you to logically group together a set of otherwise unrelated values without creating a type in only one way:
void M(int foo, string bar, double blah)
Logically this is exactly the same as a method M that takes one argument which is a 3-tuple of int, string, double. But I hope you would not actually make:
class MArguments
{
public int Foo { get; private set; }
... etc
unless MArguments had some other meaning in the business logic.
The concept of "group together a bunch of otherwise unrelated data in some structure that is more lightweight than a class" is useful in many, many places, not just for formal parameter lists of methods. It's useful when a method has two things to return, or when you want to key a dictionary off of two data rather than one, and so on.
Languages like F# which support tuple types natively provide a great deal of flexibility to their users; they are an extremely useful set of data types. The BCL team decided to work with the F# team to standardize on one tuple type for the framework so that every language could benefit from them.
However, there is at this point no language support for tuples in C#. Tuples are just another data type like any other framework class; there's nothing special about them. We are considering adding better support for tuples in hypothetical future versions of C#. If anyone has any thoughts on what sort of features involving tuples you'd like to see, I'd be happy to pass them along to the design team. Realistic scenarios are more convincing than theoretical musings.
Tuples provide an immutable implementation of a collection
Aside from the common uses of tuples:
to group common values together without having to create a class
to return multiple values from a function/method
etc...
Immutable objects are inherently thread safe:
Immutable objects can be useful in multi-threaded applications. Multiple threads can act on data represented by immutable objects without concern of the data being changed by other threads. Immutable objects are therefore considered to be more thread-safe than mutable objects.
From "Immutable Object" on wikipedia
It provides an alternative to ref or out if you have a method that needs to return multiple new objects as part of its response.
It also allows you to use a built-in type as a return type if all you need to do is mash-up two or three existing types, and you don't want to have to add a class/struct just for this combination. (Ever wish a function could return an anonymous type? This is a partial answer to that situation.)
It's often helpful to have a "pair" type, just used in quick situations (like returning two values from a method). Tuples are a central part of functional languages like F#, and C# picked them up along the way.
very useful for returning two values from a function
Personally, I find Tuples to be an iterative part of development when you're in an investigative cycle, or just "playing". Because a Tuple is generic, I tend to think of it when working with generic parameters - especially when wanting to develop a generic piece of code, and I'm starting at the code end, instead of asking myself "how would I like this call to look?".
Quite often I realise that the collection that the Tuple forms become part of a list, and staring at List> doesn't really express the intention of the list, or how it works. I often "live" with it, but find myself wanting to manipulate the list, and change a value - at which point, I don't necessarily want to create a new Tuple for that, thus I need to create my own class or struct to hold it, so I can add manipulation code.
Of course, there's always extension methods - but quite often you don't want to extend that extra code to generic implementations.
There have been times I'm wanted to express data as a Tuple, and not had Tuples available. (VS2008) in which case I've just created my own Tuple class - and I don't make it thread safe (immutable).
So I guess I'm of the opinion that Tuples are lazy programming at the expense of losing a type name that describes it's purpose. The other expense is that you have to declare the signature of the Tuple whereever it's used as a parameter. After a number of methods that begin to look bloated, you may feel as I do, that it is worth making a class, as it cleans up the method signatures.
I tend to start by having the class as a public member of the class you're already working in. But the moment it extends beyond simply a collection of values, it get's it's own file, and I move it out of the containing class.
So in retrospect, I believe I use Tuples when I don't want to go off and write a class, and just want to think about what I've writing right now. Which means the signature of the Tuple may change quite a lot in the text half an hour whilst I figure out what data I am going to need for this method, and how it's returning what ever values it will return.
If I get a chance to refactor code, then often I'll question a Tuple's place in it.
Old question since 2010, and now in 2017 Dotnet changes and become more smart.
C# 7 introduces language support for tuples, which enables semantic names for the fields of a tuple using new, more efficient tuple types.
In vs 2017 and .Net 4.7 (or installing nuget package System.ValueTuple), you can create/use a tuple in a very efficient and simple way:
var person = (Id:"123", Name:"john"); //create tuble with two items
Console.WriteLine($"{person.Id} name:{person.Name}") //access its fields
Returning more than one value from a method:
public (double sum, double average) ComputeSumAndAverage(List<double> list)
{
var sum= list.Sum();
var average = sum/list.Count;
return (sum, average);
}
How to use:
var list=new List<double>{1,2,3};
var result = ComputeSumAndAverage(list);
Console.WriteLine($"Sum={result.sum} Average={result.average}");
For more details read: https://learn.microsoft.com/en-us/dotnet/csharp/tuples
A Tuple is often used to return multiple values from functions when you don’t want to create a specific type. If you're familiar with Python, Python has had this for a long time.
Returning more than one value from a function. getCoordinates() isn't very useful if it just returns x or y or z, but making a full class and object to hold three ints also seems pretty heavyweight.
A common use might be to avoid creating classes/structs that only contains 2 fields, instead you create a Tuple (or a KeyValuePair for now).
Usefull as a return value, avoid passing N out params...
I find the KeyValuePair refreshing in C# to iterate over the key value pairs in a Dictionary.
Its really helpful while returning values from functions. We can have multiple values back and this is quite a saver in some scenarios.
I stumbled upon this performance benchmark between Tuples and Key-Value pairs and probably you will find it interesting. In summary it says that Tuple has advantage because it is a class, therefore it is stored in the heap and not in the stack and when passed around as argument its pointer is the only thing that is going. But KeyValuePair is a structs so it is faster to allocate but it is slower when used.
http://www.dotnetperls.com/tuple-keyvaluepair
Related
I'm trying to understand the design decision behind this part of the language. I admit i'm very new to it all but this is something which caught me out initially and I was wondering if I'm missing an obvious reason. Consider the following code:
List<int> MyList = new List<int>() { 5, 4, 3, 2, 1 };
int[] MyArray = {5,4,3,2,1};
//Sort the list
MyList.Sort();
//This was an instance method
//Sort the Array
Array.Sort(MyArray);
//This was a static method
Why are they not both implemented in the same way - intuitively to me it would make more sense if they were both instance methods?
The question is interesting because it reveals details of the .NET type system. Like value types, string and delegate types, array types get special treatment in .NET. The most notable oddish behavior is that you never explicitly declare an array type. The compiler takes care of it for you with ample helpings of the jitter. System.Array is an abstract type, you'll get dedicated array types in the process of writing code. Either by explicitly creating a type[] or by using generic classes that have an array in their base implementation.
In a largish program, having hundreds of array types is not unusual. Which is okay, but there's overhead involved for each type. It is storage required for just the type, not the objects of it. The biggest chunk of it is the so-called 'method table'. In a nutshell, it is a list of pointers to each instance method of the type. Both the class loader and the jitter work together to fill this table. This is commonly known as the 'v-table' but isn't quite a match, the table contains pointers to methods that are both non-virtual and virtual.
You can see where this leads perhaps, the designers were worried about having lots of types with big method tables. So looked for ways to cut down on the overhead.
Array.Sort() was an obvious target.
The same issue is not relevant for generic types. A big nicety of generics, one of many, one method table can handle the method pointers for any type parameter of a reference type.
You are comparing two different types of 'object containers':
MyList is a generic collection of type List, a wrapper class, of type int, where the List<T> represents a strongly typed list of objects. The List class itself provides methods to search, sort, and manipulate its contained objects.
MyArray is a basic data structure of type Array. The Array does not provide the same rich set of methods as the List. Arrays can at the same time be single-dimensional, multidimensional or jagged, whilst Lists out of the box only are single-dimensional.
Take a look at this question, it provides a richer discussion about these data types: Array versus List<T>: When to use which?
Without asking someone who was involved in the design of the original platform it's hard to know. But, here's my guess.
In older languages, like C, arrays are dumb data structures - they have no code of their own. Instead, they're manipulated by outside methods. As you move into an Object oriented framework, the closest equivilent is a dumb object (with minimal methods) manipulated by static methods.
So, my guess is that the implementation of .NET Arrays is more a symptom of C style thinking in the early days of development than anything else.
This likely has to do with inheritance. The Array class cannot be manually derived from. But oddly, you can declare an array of anything at all and get an instance of System.Array that is strongly typed, even before generics allowed you to have strongly typed collections. Array seems to be one of those magic parts of the framework.
Also notice that none of the instance methods provided on an array massively modify the array. SetValue() seems to be the only one that changes anything. The Array class itself provides many static methods that can change the content of the array, like Reverse() and Sort(). Not sure if that's significant - maybe someone here can give some background as to why that's the case.
In contrast, List<T> (which wasn't around in the 1.0 framework days) and classes like ArrayList (which was around back then) are just run-of-the mill classes with no special meaning within the framework. They provide a common .Sort() instance method so that when you inherited from these classes, you'd get that functionality or could override it.
However, these kinds of sort methods have gone out of vogue anyway as extension methods like Linq's .OrderBy() style sorting have become the next evolution. You can query and sort arrays and Lists and any other enumerable object with the same mechanism now, which is really, really nice.
-- EDIT --
The other, more cynical answer may just be - that's how Java did it so Microsoft did it the same way in the 1.0 version of the framework since at that time they were busy playing catch-up.
One reason might be because Array.Sort was designed in .NET 1.0, which had no generics.
I'm not sure, but I'm thinking maybe just so that arrays are as close to Primitives as they can be.
I need to find the minimum between 3 values, and I ended up doing something like this:
Math.Min(Math.Min(val1, val2), val3)
It just seems a little silly to me, because other languages use variadic functions for this. I highly doubt this was an oversight though.
Is there any reason why a simple Min/Max function shoundn't be variadic? Are there performance implications? Is there a variadic version that I didn't notice?
If it is a collection (A subclass of IEnumerable<T>) one could easily use the functions in the System.Linq library
int min = new int[] {2,3,4,8}.Min();
Furthermore, it's easy to implement these methods on your own:
public static class Maths {
public static T Min<T> (params T[] vals) {
return vals.Min();
}
public static T Max<T> (params T[] vals) {
return vals.Max();
}
}
You can call these methods with just simple scalars so Maths.Min(14,25,13,2) would give 2.
These are the generic methods, so there is no need to implement this method for each type of numbers (int,float,...)
I think the basic reason why these methods are not implemented in general is that, every time you call this method, an array (or at least an IList object) should be created. By keeping low-level methods one can avoid the overhead. However, I agree one could add these methods to the Math class to make the life of some programmers easier.
CommuSoft has addressed how to accomplish the equivalent in C#, so I won't retread that part.
To specifically address your question "Why aren't C#'s Math.Min/Max variadic?", two thoughts come to mind.
First, Math.Min (and Math.Max) is not, in fact, a C# language feature, it is a .NET framework library feature. That may seem pedantic, but it is an important distinction. C# does not, in fact, provide any special purpose language feature for determining the minimum or maximum value between two (or more) potential values.
Secondly, as Eric Lippert has pointed out a number of times, language features (and presumably framework features) are not "removed" or actively excluded - all features are unimplemented until someone designs, implements, tests, documents and ships the feature. See here for an example.
Not being a .NET framework developer, I cannot speak to the actual decision process that occurred, but it seems like this is a classic case of a feature that simply never rose to the level of inclusion, similar to the sequence foreach "feature" Eric discusses in the provided link.
I think CommuSoft is providing a robust answer that is at least suited for people searching for something along these lines, and that should be accepted.
With that said, the reason is definitely to avoid the overhead necessary for the less likely use case that people want to compare a group rather than two values.
As pointed about by #arx, using a parametric would be unnecessary overhead for the most used case, but it would also be a lot of unnecessary overhead with regards to the loop that would have to be used internally to go through the array n - 1 times.
I can easily see an argument for having created the method in addition to the basic form, but with LINQ that's just no longer necessary.
I am running through some tests about using ArrayLists and List.
Speed is very important in my app.
I have tested creating 10000 records in each, finding an item by index and then updating that object for example:
List[i] = newX;
Using the arraylist seems much faster. Is that correct?
UPDATE:
Using the List[i] approach, for my List<T> approach I am using LINQ to find the index eg/
....
int index = base.FindIndex(x=>x.AlpaNumericString = "PopItem");
base[index] = UpdatedItem;
It is definately slower than
ArrayList.IndexOf("PopItem"))
base[index] = UpdatedItem;
A generic List (List<T>) should always be quicker than an ArrayList.
Firstly, an ArrayList is not strongly-typed and accepts types of object, so if you're storing value types in the ArrayList, they are going to be boxed and unboxed every time they are added or accessed.
A Generic List can be defined to accept only (say) int's so therefore no boxing or unboxing needs to occur when adding/accessing elements of the list.
If you're dealing with reference types, you're probably still better off with a Generic List over an ArrayList, since although there's no boxing/unboxing going on, your Generic List is type-safe, and there will be no implicit (or explicit) casts required when retrieving your strongly-typed object from the ArrayList's "collection" of object types.
There may be some edge-cases where an ArrayList is faster performing than a Generic List, however, I (personally) have not yet come across one. Even the MSDN documentation states:
Performance Considerations
In deciding whether to use the
List<(Of <(T>)>) or ArrayList class,
both of which have similar
functionality, remember that the
List<(Of <(T>)>) class performs better
in most cases and is type safe. If a
reference type is used for type T of
the List<(Of <(T>)>) class, the
behavior of the two classes is
identical. However, if a value type is
used for type T, you need to consider
implementation and boxing issues.
If a value type is used for type T,
the compiler generates an
implementation of the List<(Of <(T>)>)
class specifically for that value
type. That means a list element of a
List<(Of <(T>)>) object does not have
to be boxed before the element can be
used, and after about 500 list
elements are created the memory saved
not boxing list elements is greater
than the memory used to generate the
class implementation.
Make certain the value type used for
type T implements the IEquatable<(Of
<(T>)>) generic interface. If not,
methods such as Contains must call the
Object..::.Equals(Object) method,
which boxes the affected list element.
If the value type implements the
IComparable interface and you own the
source code, also implement the
IComparable<(Of <(T>)>) generic
interface to prevent the BinarySearch
and Sort methods from boxing list
elements. If you do not own the source
code, pass an IComparer<(Of <(T>)>)
object to the BinarySearch and Sort
methods
Moreover, I particularly like the very last section of that paragraph, which states:
It is to your advantage to use the type-specific implementation of the List<(Of <(T>)>) class instead of using the ArrayList class or writing a strongly typed wrapper collection yourself. The reason is your implementation must do what the .NET Framework does for you already, and the common language runtime can share Microsoft intermediate language code and metadata, which your implementation cannot.
Touché! :)
Based on your recent edit it seems as though you're not performing a 1:1 comparison here. In the List you have a class object and you're looking for the index based on a property, whereas in the ArrayList you just store the values of that property. If so, this is a severely flawed comparison.
To make it a 1:1 comparison you would add the values to the list only, not the class. Or, you would add the class items to the ArrayList. The former would allow you to use IndexOf on both collections. The latter would entail looping through your entire ArrayList and comparing each item till a match was found (and you could do the same for the List), or overriding object.Equals since ArrayList uses that for comparison.
For an interesting read, I suggest taking a look at Rico Mariani's post: Performance Quiz #7 -- Generics Improvements and Costs -- Solution. Even in that post Rico also emphasizes the need to benchmark different scenarios. No blanket statement is issued about ArrayLists, although the general consensus is to use generic lists for performance, type safety, and having a strongly typed collection.
Another related article is: Why should I use List and not ArrayList.
ArrayList seems faster? According to the documentation ( http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx ) List should be faster when using a value type, and the same speed when using a reference type. ArrayList is slower with value types because it needs to box/unbox the values when you're accessing them.
I would expect them to be about the same if they are value-types. There is an extra cast/type-check for ArrayList, but nothing huge. Of course, List<T> should be preferred. If speed is the primary concern (which it almost always isn't, at least not in this way), then you might also want to profile an array (T[]) - harder (=more expensive) to add/remove, of course - but if you are just querying/assigning by index, it should be the fastest. I have had to resort to arrays for some very localised performance critical work, but 99.95% of the time this is overkill and should be avoided.
For example, for any of the 3 approaches (List<T>/ArrayList/T[]) I would expect the assignment cost to be insignificant to the cost of newing up the new instance to put into the storage.
Marc Gravell touched on this in his anwswer - I think it needs to be stressed.
It is usually a waste of time to prematurely optimize your code!
A better approach is to do a simple, well designed first implementation, and test it with anticipated real world data loads.
Often, you will find that it's "fast enough". (It helps to start out with a clear definition of "fast enough" - e.g. "Must be able to find a single CD in a 10,000 CD collection in 3 seconds or less")
If it's not, put a profiler on it. Almost invariably, the bottle neck will NOT be where you expect.
(I learned this the hard way when I brought a whole app to it's knees with single badly chosen string concatenation)
In a recent project I was working I created a structure in my class to solve a problem I was having, as a colleague was looking over my shoulder he looked derisively at the structure and said "move it into a class".
I didn't have any argument for not moving it into a class other than I only need it in this class but this kind of falls down because couldn't I make it a nested class?
When is it ok to use a structure?
You should check out the value type usage guidelines: http://msdn.microsoft.com/en-us/library/y23b5415(vs.71).aspx
The article lists several important points but the few that I feel are the most valuable are the following
Is the value immutable?
Do you want the type to have value semantics?
If the answer to both questions is yes then you almost certainly want to use a Structure. Otherwise I would advise going with a class.
There are issues with using structures with a large amount of members. But I find that if I consider the two points above, rarely do I have more than the recommended number of members / size in my value types.
MSDN has a good guidelines document to cover structure usage. To summarize:
Act like primitive types.
Have an instance size under 16 bytes.
Are immutable.
Value semantics are desirable.
Otherwise, use a class.
You should always use a Class as your first choice, changing to Structure only for very specific reasons (as others have already outlined).
Depending on how much you "only need it in this class", you might be able to avoid the nested type completely by using an anonymous type; this will only work within a single method:
Public Class Foo
Public Sub Bar
Dim baz = New With { .Str = "String", .I = 314 }
End Sub
End Class
you can't (readily--there are a few things you can do with generics) move the instance baz outside of the Sub in a typesafe manner. Of course an Object can hold anything, even an instance of an anonymous type.
I think structures are great if you need copy the object or do not want it to be modified by the passed function. Since passed functions can not modify the originally passed structure instead got a new copy of it, this can be a life saver. (unless they passed as ByRef obviously) and can save you trouble of deep copy craziness in .NET or implementing pain of an ICloneSomething implementation.
But the general idea is defining a custom data structure in a more semantic way.
About moving to a class, if you are moving into a class where it'll be part of a class, generally this is good practice since your structure is 99% of the time related with one of you classes not related with a namespace.
If you are converting it to a class then you need to consider "is it defining a data strcuture" and "is it expensive?" since it's gonna be copied all over the place, "do you want to get affected by modifications done by the passers?"
The usage guidelines referenced by Marc and Rex are excellent and nicely cover cases where you aren't sure which one you would want. I will list some use cases where use of a struct is a requirement.
When you need to set the layout of the fields in memory
Interop with unmanaged code.
When you want to make Unions.
You need a fixed size buffer inlined.
You want to be able to do the equivalent of a reinterpret_cast with relative safety (so long as the struct does not contain any fields which are themselves reference types.
These are normally edge cases and (with the exception of interop) not recommended practices unless their use is necessary for the success of the project/program.
I'm just wondering how other developers tackle this issue of getting 2 or 3 answers from a method.
1) return a object[]
2) return a custom class
3) use an out or ref keyword on multiple variables
4) write or borrow (F#) a simple Tuple<> generic class
http://slideguitarist.blogspot.com/2008/02/whats-f-tuple.html
I'm working on some code now that does data refreshes. From the method that does the refresh I would like to pass back (1) Refresh Start Time and (2) Refresh End Time.
At a later date I may want to pass back a third value.
Thoughts? Any good practices from open source .NET projects on this topic?
It entirely depends on what the results are. If they are related to one another, I'd usually create a custom class.
If they're not really related, I'd either use an out parameter or split the method up. If a method wants to return three unrelated items, it's probably doing too much. The exception to this is when you're talking across a web-service boundary or something else where a "purer" API may be too chatty.
For two, usually 4)
More than that, 2)
Your question points to the possibility that you'll be returning more data in the future, so I would recommend implementing your own class to contain the data.
What this means is that your method signature will remain the same even if the inner representation of the object you're passing around changes to accommodate more data. It's also good practice for readability and encapsulation reasons.
Code Architeture wise i'd always go with a Custom Class when needing somewhat a specific amount of variables changed. Why? Simply because a Class is actually a "blueprint" of an often used data type, creating your own data type, which it in this case is, will help you getting a good structure and helping others programme for your interface.
Personally, I hate out/ref params, so I'd rather not use that approach. Also, most of the time, if you need to return more than one result, you are probably doing something wrong.
If it really is unavoidable, you will probably be happiest in the long run writing a custom class. Returning an array is tempting as it is easy and effective in the short teerm, but using a class gives you the option of changing the return type in the future without having to worry to much about causing problems down stream. Imagine the potential for a debugging nightmare if someone swaps the order of two elements in the array that is returned....
I use out if it's only 1 or 2 additional variables (for example, a function returns a bool that is the actual important result, but also a long as an out parameter to return how long the function ran, for logging purposes).
For anything more complicated, i usually create a custom struct/class.
I think the most common way a C# programmer would do this would be to wrap the items you want to return in a separate class. This would provide you with the most flexibility going forward, IMHO.
It depends. For an internal only API, I'll usually choose the easiest option. Generally that's out.
For a public API, a custom class usually makes more sense - but if it's something fairly primitive, or the natural result of the function is a boolean (like *.TryParse) I'll stick with an out param. You can do a custom class with an implicit cast to bool as well, but that's usually just weird.
For your particular situation, a simple immutable DateRange class seems most appropriate to me. You can easily add that new value without disturbing existing users.
If you're wanting to send back the refresh start and end times, that suggests a possible class or struct, perhaps called DataRefreshResults. If your possible third value is also related to the refresh, then it could be added. Remember, a struct is always passed by value, so it's allocated on the heap does not need to be garbage-collected.
Some people use KeyValuePair for two values. It's not great though because it just labels the two things as Key and Value. Not very descriptive. Also it would seriously benefit from having this added:
public static class KeyValuePair
{
public static KeyValuePair<K, V> Make(K k, V v)
{
return new KeyValuePair<K, V>(k, v);
}
}
Saves you from having to specify the types when you create one. Generic methods can infer types, generic class constructors can't.
For your scenario you may want to define generic Range{T} class (with checks for the range validity).
If method is private, then I usually use tuples from my helper library. Public or protected methods generally always deserve separate.
Return a custom type, but don't use a class, use a struct - no memory allocation/garbage collection overhead means no downsides.
If 2, a Pair.
If more than 2 a class.
Another solution is to return a dictionary of named object references. To me, this is pretty equivalent to using a custom return class, but without the clutter. (And using RTTI and reflection it is just as typesafe as any other solution, albeit dynamically so.)
It depends on the type and meaning of the results, as well as whether the method is private or not.
For private methods, I usually just use a Tuple, from my class library.
For public/protected/internal methods (ie. not private), I use either out parameter or a custom class.
For instance, if I'm implementing the TryXYZ pattern, where you have an XYZ method that throws an exception on failure and a TryXYZ method that returns Boolean, TryXYZ will use an out parameter.
If the results are sequence-oriented (ie. return 3 customers that should be processed) then I will typically return some kind of collection.
Other than that I usually just use a custom class.
If a method outputs two to three related value, I would group them in a type. If the values are unrelated, the method is most likely doing way too much and I would refactor it into a number of simpler methods.