System.Linq.Lookup vs. Wintellect.PowerCollections.MultiDictionary - c#

I still use Wintellect's PowerCollections library, even though it is aging and not maintained because it did a good job covering holes left in the standard MS Collections libraries. But LINQ and C# 4.0 are poised to replace PowerCollections...
I was very happy to discover System.Linq.Lookup because it should replace Wintellect.PowerCollections.MultiDictionary in my toolkit. But Lookup seems to be immutable! Is that true, can you only created a populated Lookup by calling ToLookup?

Yes, you can only create a Lookup by calling ToLookup. The immutable nature of it means that it's easy to share across threads etc, of course.
If you want a mutable version, you could always use the Edulinq implementation as a starting point. It's internally mutable, but externally immutable - and I wouldn't be surprised if the Microsoft implementation worked in a similar way.
Personally I'm rarely in a situation where I want to mutate the lookup - I would prefer to perform appropriate transformations on the input first. I would encourage you to think in this way too - I find myself wishing for better immutability support from other collections (e.g. Dictionary) more often than I wish that Lookup were mutable :)

That is correct. Lookup is immutable, you can create an instance by using the Linq ToLookup() extension method. Technically even that fact is an implementation detail since the method returns an ILookup interface which in the future might be implemented by some other concrete class.

Related

Why are so many simple types in the .Net framework not marked as serializable?

I find it a recurring inconvenience that a lot of simple types in the .Net framework are not marked as serializable. For example: System.Drawing.Point or Rectangle.
Both those structs only consist of primitive data and should be serializable in any format easily. However, because of the missing [System.Serializable] attribute, I can't use them with a BinaryFormatter.
Is there any reason for this, which I'm not seeing?
It is simply a question of efficiency. Tagging a field as serializable the compiler must map each field onto a table of aliases. If they were all marked as serializables every object injecting or inheriting them need to be mapped aswell onto the table of aliases to process its serialization when probably you will never use them and it has a cost of memory and processing and it is more unsafe. Test it with millions of elements and you will see.
Personally, I believe it has less to do with the need to pass the buck, and more to do with the fact of usefulness and actual use, coupled with the fact that the .NET Framework is simply that, a framework. It is designed to be a stepping stone which provides you the basics to complete tasks that would otherwise be daunting in other languages, rather than do everything for you.
There really isn't anything stopping you from creating your own serialization mechanisms and extensions which provide the functionality you're seeking, or relying on many of the other products out there which are FOSS or paid which achieve this for you OOB.
Granted, #Hans Passant's answer is I think very close to the truth, there's a lot of other facets to this which go beyond just simply "It's not my problem." You can take it whatever way you want, but the ultimate thing you need to get out of it is, "where can I go from here?"

What is the benefit of using type-safe collection classes?

I was wondering, why on some occasions i see a class representing some type's collection.
For example:
In Microsoft XNA Framework: TextureCollection, TouchCollection, etc.
Also other classes in the .NET framework itself, ending with Collection.
Why is it designed this way? what are the benefits for doing it this way and not as a generic type collection, like was introduced in C# 2.0 ?
Thanks
The examples you gave are good ones. TextureCollection is sealed and has no public constructor, only an internal one. TouchCollection implements IList<TouchLocation>, similar to the way List<T> implements IList<T>. Generics at work here btw, the upvoted answer isn't correct.
TextureCollection is intentionally crippled, it makes sure that you can never create an instance of it. Only secret knowledge about textures can fill this collection, a List<> wouldn't suffice since it cannot be initialized with that secret knowledge that makes the indexer work. Nor does the class need to be generic, it only knows about Texture class instances.
The TouchCollection is similarly specialized. The Add() method throws a NotSupportedException. This cannot be done with a regular List<> class, its Add() method isn't virtual so cannot be overridden to throw the exception.
This is not unusual.
In the .NET framework itself, many type-safe collections predate 2.0 Generics, and are kept for compatibility.
For several XAML-related contexts, there's either no syntax to specify a generic class, or the syntax is cumbersome. Therefore, when List<T> wiould be used, there's a specific TList written for each need.
It allows you to define your own semantics on the collection (you may not want to have an Add or AddRange method etc...).
Additionally, readability is increased by not having your code littered with List<Touch> and List<Texture> everywhere.
There is also quite a lot of .NET 1.0/1.1 code that still needs to work, so the older collections that predate generics still need to exist.
It's not that easy to use generic classes in XAML for example.
Following on from Oded's answer, your own class type allows for much easier change down the track when you decide you want a stack / queue etc instead of that List. There can be lots of reasons for this, including performance, memory use etc.
In fact, it's usually a good idea to hide that type of implementation detail - users of your class just want to know that it stores Textures, not how.

Dynamic Lang. Runtime vs Reflection

I am planning to use dynamic keyword for my new project. But before stepping in, I would like to know about the pros and cons in using dynamic keyword over Reflection.
Following where the pros, I could find in respect to dynamic keyword:
Readable\Maintainable code.
Fewer lines of code.
While the negatives associated with using dynamic keyword, I came to hear was like:
Affects application performance.
Dynamic keyword is internally a wrapper of Reflection.
Dynamic typing might turn into breeding ground for hard to find bugs.
Affects interoperability with previous .NET versions.
Please help me on whether the pros and cons I came across are sensible or not?
Please help me on whether the pros and cons I came across are sensible or not?
The concern I have with your pros and cons is that some of them do not address differences between using reflection and using dynamic. That dynamic typing makes for bugs that are not caught until runtime is true of any dynamic typing system. Reflection code is just as likely to have a bug as code that uses the dynamic type.
Rather than thinking of it in terms of pros and cons, think about it in more neutral terms. The question I'd ask is "What are the differences between using Reflection and using the dynamic type?"
First: with Reflection you get exactly what you asked for. With dynamic, you get what the C# compiler would have done had it been given the type information at compile time. Those are potentially two completely different things. If you have a MethodInfo to a particular method, and you invoke that method with a particular argument, then that is the method that gets invoked, period. If you use "dynamic", then you are asking the DLR to work out at runtime what the C# compiler's opinion is about which is the right method to call. The C# compiler might pick a method different than the one you actually wanted.
Second: with Reflection you can (if your code is granted suitably high levels of trust) do private reflection. You can invoke private methods, read private fields, and so on. Whether doing so is a good idea, I don't know. It certainly seems dangerous and foolish to me, but I don't know what your application is. With dynamic, you get the behaviour that you'd get from the C# compiler; private methods and fields are not visible.
Third: with Reflection, the code you write looks like a mechanism. It looks like you are loading a metadata source, extracting some types, extracting some method infos, and invoking methods on receiver objects through the method info. Every step of the way looks like the operation of a mechanism. With dynamic, every step of the way looks like business logic. You invoke a method on a receiver the same way as you'd do it in any other code. What is important? In some code, the mechanism is actually the most important thing. In some code, the business logic that the mechanism implements is the most important thing. Choose the technique that emphasises the right level of abstraction.
Fourth: the performance costs are different. With Reflection you do not get any cached behaviour, which means that operations are generally slower, but there is no memory cost for maintaining the cache and every operation is roughly the same cost. With the DLR, the first operation is very slow indeed as it does a huge amount of analysis, but the analysis is cached and reused. That consumes memory, in exchange for increased speed in subsequent calls in some scenarios. What the right balance of speed and memory usage is for your application, I don't know.
Readable\Maintainable code
Certainly true in my experence.
Fewer lines of code.
Not significantly, but it will help.
Affects application performance.
Very slightly. But not even close to the way reflection does.
Dynamic keyword is internally a wrapper of Reflection.
Completely untrue. The dynamic keyword leverages the Dynamic Library Runtime.
[Edit: correction as per comment below]
It would seem that the Dynamic Language Runtime does use Reflection and the performance improvements are only due to cacheing techniques.
Dynamic typing might turn into breeding ground for hard to find bugs.
This may be true; it depends how you write your code. You are effectively removing compiler checking from your code. If your test coverage is good, this probably won't matter; if not then I suspect you will run into problems.
Affects interoperability with previous .NET versions
Not true. I mean you won't be able to compile your code against older versions, but if you want to do that then you should use the old versions as a base and up-compile it rather than the other way around. But if you want to use a .NET 2 library then you shouldn't run into too many problems, as long as you include the declaration in app.config / web.config.
One significant pro that you're missing is the improved interoperability with COM/ATL components.
There are 4 great differences between Dynamic and reflection. Below is a detailed explanation of the same. Reference http://www.codeproject.com/Articles/593881/What-is-the-difference-between-Reflection-and-Dyna
Point 1. Inspect VS Invoke
Reflection can do two things one is it can inspect meta-data and second it also has the ability to invoke methods on runtime.While in Dynamic we can only invoke methods. So if i am creating software's like visual studio IDE then reflection is the way to go. If i just want dynamic invocation from the my c# code, dynamic is the best option.
Point 2. Private Vs Public Invoke
You can not invoke private methods using dynamic. In reflection its possible to invoke private methods.
Point 3. Caching
Dynamic uses reflection internally and it also adds caching benefits. So if you want to just invoke a object dynamically then Dynamic is the best as you get performance benefits.
Point 4. Static classes
Dynamic is instance specific: You don't have access to static members; you have to use Reflection in those scenarios.
In most cases, using the dynamic keyword will not result in meaningfully shorter code. In some cases it will; that depends on the provider and as such it's an important distinction. You should probably never use the dynamic keyword to access plain CLR objects; the benefit there is too small.
The dynamic keyword undermines automatic refactoring tools and makes high-coverage unit tests more important; after all, the compiler isn't checking much of anything when you use it. That's not as much of an issue when you're interoperating with a very stable or inherently dynamically typed API, but it's particularly nasty if you use keyword dynamic to access a library whose API might change in the future (such as any code you yourself write).
Use the keyword sparingly, where it makes sense, and make sure such code has ample unit tests. Don't use it where it's not needed or where type inference (e.g. var) can do the same.
Edit: You mention below that you're doing this for plug-ins. The Managed Extensibility Framework was designed with this in mind - it may be a better option that keyword dynamic and reflection.
If you are using dynamic specifically to do reflection your only concern is compatibility with previous versions. Otherwise it wins over reflection because it is more readable and shorter. You will lose strong typing and (some) performance from the very use of reflection anyway.
The way I see it all your cons for using dynamic except interoperability with older .NET versions are also present when using Reflection:
Affects application performance
While it does affect the performance, so does using Reflection. From what I remember the DLR more or less uses Reflection the first time you access a method/property of your dynamic object for a given type and caches the type/access target pair so that later access is just a lookup in the cache making it faster then Reflection
Dynamic keyword is internally a wrapper of Reflection
Even if it was true (see above), how would that be a negative point? Whether or not it does wrap Reflection shouldn't influence your application in any significant matter.
Dynamic typing might turn into breeding ground for hard to find bugs
While this is true, as long as you use it sparingly it shouldn't be that much of a problem. Furthermore is you basically use it as a replacement for reflection (that is you use dynamic only for the briefest possible durations when you want to access something via reflection), the risk of such bugs shouldn't be significantly higher then if you use reflection to access your methods/properties (of course if you make everything dynamic it can be more of a problem).
Affects interoperability with previous .NET versions
For that you have to decide yourself how much of a concern it is for you.

Why GetHashCode is in Object class?

Why GetHashCode is part of the Object class? Only small part of the objects of the classes are used as keys in hash tables. Wouldn't it be better to have a separate interface which must be implemented when we want objects of the class to serve as keys in hash table.
There must be a reason that MS team decided to include this method in Object class and thus make it available "everywhere".
It was a design mistake copied from Java, IMO.
In my perfect world:
ToString would be renamed ToDebugString to set expectations appropriately
Equals and GetHashCode would be gone
There would be a ReferenceEqualityComparer implementation of IEqualityComparer<T>: the equals part of this is easy at the moment, but there's no way of getting an "original" hash code if it's overridden
Objects wouldn't have monitors associated with them: Monitor would have a constructor, and Enter/Exit etc would be instance methods.
Equality (and thus hashing) cause problems in inheritance hierarchies in general - so long as you can always specify the kind of comparison you want to use (via IEqualityComparer<T>) and objects can implement IEquatable<T> themselves if they want to, I don't see why it should be on Object. EqualityComparer<T>.Default could use the reference implementation if T didn't implement IEquatable<T> and defer to the objects otherwise. Life would be pleasant.
Ah well. While I'm at it, array covariance was another platform mistake. If you want language mistakes in C#, I can start another minor rant if you like ;) (It's still by far my favourite language, but there are things I wish had been done differently.)
I've blogged about this elsewhere, btw.
Only small part of the objects of the classes are used as keys in hash tables
I would argue that this is not a true statement. Many classes are often used as keys in hash tables - and object references themselves are very often used. Having the default implementation of GetHashCode exist in System.Object means that ANY object can be used as a key, without restrictions.
This seems much nicer than forcing a custom interface on objects, just to be able to hash them. You never know when you may need to use an object as the key in a hashed collection.
This is especially true when using things like HashSet<T> - In this case, often, an object reference is used for tracking and uniqueness, not necessarily as a "key". Had hashing required a custom interface, many classes would become much less useful.
It allows any object to be used as a key by "identity". This is beneficial in some cases, and harmful in none. So, why not?
So anything can be keyed on. (Sorta)
That way HashTable can take an object vs something that implements IHashable for example.
To Drive simple equality comparison.
On objects that don't implement it directly it defaults to .NET's Internal Hash Code which I believe is either a unique ID for the object instance or a hash of the memory footprint it takes up. (I cannot remember and .NET Reflector can't go past the .NET component of the class).
GetHashCode is in object so that you can use anything as a key into a Hashtable, a basic container class. It provides symmetry. I can put anything into an ArrayList, why not a Hashtable?
If you require classes to implement IHashable, then for every sealed class that doesn't implement IHashable, you will writing adapters when you want to use it as key that include the hashing capability. Instead, you get it by default.
Also Hashcodes are a good second line for object equality comparison (first line is pointer equality).
If every class has GetHashCode you can put every object in a hash. Imagine you have a to use third party objects (which you can't modify) and want to put them into ab hash. If these objects didn't implement you fictional IHashable you couldn't do it. This is obviously a bad thing ;)
Just a guess, but the garbage collector may store hashtables of some objects internally (perhaps to keep track of finalizable objects), which means any object needs to have a hash key.

Why doesn't ReadOnlyCollection<> include methods like FindAll(), FindFirst(),

Following the suggestions of FxCop and my personal inclination I've been encouraging the team I'm coaching to use ReadOnlyCollections as much possible. If only so that recipients of the lists can't modify their content. In their theory this is bread & butter. The problem is that the List<> interface is much richer exposing all sorts of useful methods. Why did they make that choice?
Do you just give up and return writable collections? Do you return readonly collections and then wrap them in the writable variety? Ahhhhh.
Update:
Thanks I'm familiar with the Framework Design Guideline and thats why the team is using FxCop to enforce it. However this team is living with VS 2005 (I know, I know) and so telling them that LINQ/Extension methods would solve their problems just makes them sad.
They've learned that List.FindAll() and .FindFirst() provide greater clarity than writing a foreach loop. Now I'm pushing them to use ReadOnlyCollections they lose that clarity.
Maybe there is a deeper design problem that I'm not spotting.
-- Sorry the original post should have mentioned the VS2005 restriction. I've lived with for so long that I just don't notice.
Section 8.3.2 of the .NET Framework Design Guidelines Second Edition:
DO use ReadOnlyCollection<T>, a subclass of ReadOnlyCollection<T>, or in rare cases IEnumerable<T> for properties or return values representing read-only collections.
We go with ReadOnlyCollections to express our intent of the collection returned.
The List<T> methods you speak of were added in .NET 2.0 for convenience. In C# 3.0 / .NET 3.5, you can get all those methods back on ReadOnlyCollection<T> (or any IEnumerable<T>) using extension methods (and use LINQ operators as well), so I don't think there's any motivation for adding them natively to other types. The fact that they exist at all on List is just a historical note due to the presence of extension methods being available now but weren't in 2.0.
First off, ReadOnlyCollection<T> does implement IEnumerable<T> and IList<T>. With all of the extension methods in .NET 3.5 and LINQ, you have access to nearly all of the functionality from the original List<T> class in terms of querying, which is all you should do with a ReadOnlyCollection<T> anyways.
That being said, your initial question leads me to make some suggestions...
Returning List<T> is bad design, so it shouldn't be a point of comparison. List<T> should be used for implementation, but for the interface, IList<T> should be returned. The Framework Design Guidelines specifically state:
"DO NOT use ArrayList or List<T> in public APIs." (Page 251)
If you take that into consideration, there is absolutely no disadvantage to ReadOnlyCollection<T> when compared to List<T>. Both of these classes implement IEnumerable<T> and IList<T>, which are the interfaces that should be returned anyways.
I don't have any insight as to why they weren't originally added. But now that we have LINQ I certainly see no reason to add them in future versions of the language. The methods you mentioned can easily be written in a LINQ query today. These days I just use the LINQ queries for pretty much everything. I actually more often get annoyed with List<T> having those methods because it conflicts with extension methods I write against IEnumerable<T>.
I think Jeff's answer kinda contains the answer you need; instead of ReadOnlyCollection<T>, return a subclass of it... one that you implement yourself to include the methods that you'd like to use without upgrading to VS2008/LINQ.

Categories

Resources