Related
I got a method which accepts a collection as below
public IList<CountryDto> ApplyDefaults(IList<CountryDto> dtos)
{
//Iterates the collection
//Validates the items in collection
//If items are invalid
//Removes items e.g dtos.Remove(currentCountryDto)
return dtos;//Do I need to do this?
}
My question is since, the reference to the collection is not changed, should I return the collection again from the method?
For: By returning the collection back, I make it explicit in the signature and user is aware that the items in the collection could be different from the original source. Sort of it avoid ambiguity.
Against: Since the validation doesnt change the reference of the collection, it doesn't make sense technically to return it.
What is the best approach in this case?
Note: I am not sure if this question is opinion based. I think probably I missing something here on design side.
In every programming language consistency of your own code / library with the approach of the core libraries is of high value. Hence, inspecting how Collections.sort() or Collection.swap() and Collections.shuffle() are defined, I would suggest to not return the input parameter, if you intend to modify it. In addition, your method should be named in such a way, that it is obvious the input parameter gets modified. Otherwise your method will be considered to have side-effects.
Returning a value most often suggests that it is a new instance which reflects the work, performed by the method or is used for method-chaining in case of builders.
Given your comments/requirements:
Does not need to report if defaults are applied.
ApplyDefaults is complicated and invoking other services and not intended to produce a fluent API
ApplyDefaults is a "black box"; validation logic is injected so the calling code doesn't know/care about the validation
I think based on these, this method definitely should not return the reference to the incoming list, even if no validation is applied. Firstly, unless the API is clearly built around method chaining (which you indicated you do not want), returning a List<T> type usually indicates a new List is being created. Secondly, if a new list is not created, users may find themselves modifying the list in ways they didn't expect.
Consider:
IList<CountryDto> originalCountries = Service.GetCountries();
IList<CountryDto> validatedCountries = ApplyDefaults(originalCountries);
validatedCountries.Add(mySpecialCountry);
OutputOriginalCountries(originalCountries);
OutputValidatedCountries(validatedCountries);
This code isn't very special, and a fairly common pattern. If ApplyDefaults returned a reference to the same originalCountries collection, then mySpecialCountry would also be added to originalCountries. This would violate the Principle of Least Astonishment.
This would be exacerbated if this behaviour changed depending on whether or not items were validated/filtered. Since the validation logic is a black-box of behaviour that the caller doesn't know or care about, the API consumer could not depend on whether or not it returned the same reference. They would either have to do their own reference check (e.g., if (myValidatedCountries == myInputCountries)), or simply make a copy every time. Regardless, this becomes another weird behaviour that the programmer has to juggle when working with the API.
I think that the method should either:
A) always return a copied list with the items filtered out (public IList<CountryDto> ApplyDefaults(IEnumerable<CountryDto> dtos))
B) modify the incoming list in-place (public void ApplyDefaults(IList<CountryDto> dtos))
For option A, depending on the size of your list, this incurs the possible unnecessary work of creating a copied list every time even if no filtering is performed. However, the validation/filtering logic might be simpler. You might be able to use LINQ queries to apply the filtering nicely. Additionally, removing items from a list is generally costly as it has to rebuild the internal array. So it might actually be faster to build a new list. You may even simplify the signature here to be IEnumerable<CountryDto>; this allows for wider usage and is extremely obvious that you're creating a new collection.
For option B, if no validation is required, then no work is done and the method is essentially "free" (no array rebuilding, no copying, no reference changes). But if there is significant validation, the removal aspect may be costly. Since you're not method chaining, this version should have a void return type as it's much more obvious to the developer that this is modifying the list in-place. This follows other commonly known methods like List<T>.Sort. Furthermore, if a user wants to have a separate originalCountries and validatedCountries they can always make a copy:
var validatedCountries = originalCountries.ToList();
ApplyDefaults(validatedCountries);
Ultimately, which one you choose might depend on performance. If validation/removal is cheap and rare, then modifying the list in-place might be best. If you're expecting a lot of changes to the list, it might simply be faster to produce a new copy every time.
Regardless, I would suggest you name the method with a little more clarity as well. For example:
public IList<CountryDto> GetValidCountries(IEnumerable<CountryDto> dtos)
public void RemoveInvalidCountries(IList<CountryDto> dtos)
Of course, the naming might be different depending on your actual code context (I suspect ApplyDefaults is a common/inherited method name and not specific to CountryDto)
I'd rather return boolean (or enum in an elaborated case: collection preserved intact,
changed, can't be validated etc.)
// true if the collection is changed, false otherwise
public Boolean ApplyDefaults(IList<CountryDto> dtos) {
Boolean result = false;
//Iterates the collection
//Validates the items in collection
//If items are invalid:
// Removes items e.g dtos.Remove(currentCountryDto)
// result = true;
...
return result;
}
...
if (ApplyDefaults(myData)) {
// Collection is changed, do some extra stuff
}
First of all: you cannot change the reference of the collection you send by parameter, because by default you're getting copy of it. You'd need to use a ref keyword in order to be able to change it.
Secondly: if your method has a return type, than it has to return an object. Your method is not called GetNewCollectionWithAppliedDefaults, but ApplyDefaults which implies that the collection will be modified. You should either return boolean true/false to inform user changes were done or always return parameter's collecion (to allow nested methods calling).
Also, why would you think it doesn't make sense to return a collection? I'd say there's no argument against it. Turn the question around: "why wouldn't I return the collection and could it harm my code"?
Technically, I would say there is not much difference between the two.
However, and as you pointed out, a common used convention is that a function should only return an object it creates. Basically, that would mean that a function that returns an object is generating one while a function which doesn't return anything is modifying the object passed as a parameter.
Again, this is only a convention and it is not widely used within the C# community, but in the python community for example, it is.
Some people, returns a Boolean (or an error code) instead as an indicator of an error (like the old dos command line). I don't like this approach and prefer by far raising exceptions that I can handle later on.
Finally, the best approach in my regard, is to return a value that indicates if a change was done by the function and eventually a value indicating how much of a change was done. It can be a Boolean or it can be the number of inserted/removed elements...
In any case, try to be consistent with the approach you chose, if not in all your code, at least within a single project. Sometimes, you will have no other choice but to abide with the convention used by your teammates.
(My answer is based on the Java viewpoint; C++ and C# programmers might have a different take.) I think it's best to return the collection. The fact that the collection you're returning is the same collection that was given is just an implementation detail, and in future versions of the code, you might want to change that. Document that the collection returned might not be the same one passed in.
If, on the other hand, you want to lock in the design that this method modifies a collection in place, document it that way and don't return the collection. I prefer not to do it this way, but I can see advantages in some contexts.
In your case I would leave void since ApplyDefaults clearly states what its doing.
Also, it might be a good idea to ApplyDefaults in the collection itself. Subclass IList or List or whatever and then you'd call like this:
myCollection.ApplyDefaults();
Which is just obvious.
So I didn't find any elegant solution for this, either googling or throughout stackoverflow. I guess that I have a very specific situation in my hands, anyway here it goes:
I have a object structure, which I don't have control of, because I receive this structure from an external WS. This is quite a huge object, with various levels of fields and properties, and this fields and properties can or can't be null, in any level. You can think of this object as an anemic model, it doesn't have behaviour, just state.
For the purpose of this question, I'll give you a simplified sample that simulates my situation:
Class A
PropB1
PropC11
PropLeaf111
PropC12
PropLeaf112
PropB2
PropC21
PropLeaf211
PropC22
PropLeaf221
So, throughout my code I have to access a number of these properties, in different levels, to do some math in order to calculate what I need. Basically for each type of calculation that I have to do, I have to test each level of the properties that I need, to check if it's not null, in which case I would return (decimal) 0, or any other default value depending on the business logic.
Sample of a math that I have to do with it:
var value = 0;
if (objClassA.PropB1 != null && objClassA.PropB1.PropC11 != null) {
var leaf = objClassA.PropB1.PropC11.PropLeaf111;
value = leaf.HasValue ? leaf.Value : value;
}
Just to be very, the leaf properties of this structure would always be primitives, or nullable primitives in which case I give the proper treatment. This is "the logic" that I have to do for each property that I need, and sometimes I have to use quite some of them. Also the real structure is quite bigger, so the number of verifications that I would need to do, would also be bigger for each necessary property.
Now, I came up with some ideas, none of them I think is ideal:
Create methods to gather the properties, where it would abstract any necessary verification, or the logic to get default values. The drawback is that it would have, in my opinion, quite some duplicated code, since the verifications and the default values would be similar for some groups of fields.
Create a single generic method, where it receives a object, and a lamba function that access the required field. This method would try to execute the function and return it's result, and in case of an NullReferenceException, it would return a default value. The bright side of this one, is that it is realy generic, I just have to pass lambdas to access the properties, and the method would handle any problem. The drawback of it, is that I am using try -> catch to control logic, which is not the purpose of it, and the code might look confusing for other programmers that would eventually give maintenance to it.
Null Object Pattern, this would be the most elegant solution, I guess. It would have all the good points if it was a normal case. But the thing is the impact of providing Null Objects for this structure. Just to give a bit more of context, the software that I am working on, integrates with government's services, and the structure that I am working with, which is in the government's specifications, have some fields where null have some meaning which is different from a default value like "0". Also this specification changes from time to time, and the classes are generated again, and the post processing that I would have to do to create Null Objects, would also need maintenance, which seems a bit dangerous for me.
I hope that I made myself clear enough.
Thanks in advance.
Solution
This is a response as to how I solved my problem, based on the accepted answer.
I'm quite new to C#, and this kind of discution that was linked really helped me to come up with a elegant solution in many aspects. I still have the problem that depending where the code is executed, it uses .NET 2.0, but I also found a solution for this problem, where I can somewhat define extension methods: https://stackoverflow.com/a/707160/649790
And for the solution itself, I found this one the best:
http://www.codeproject.com/Articles/109026/Chained-null-checks-and-the-Maybe-monad
I can basically access the properties this way, and just do the math:
objClassA.With(o => o.PropB1).With(o => PropC11).Return(o => PropLeaf111, 0);
For each property that I need. It still isn't just:
objClassA.PropB1.PropC11.PropLeaf111
ofcourse, but it is far better that any solution that I found so far, since I was unfamiliar with Extension Methods, I really learned a lot.
Thanks again.
There is a strategy for dealing with this, involving the "Maybe" Monad.
Basically it works by providing a "fluent" interface where the chain of properties is interrupted by a null somewhere along the chain.
See here for an example: http://smellegantcode.wordpress.com/2008/12/11/the-maybe-monad-in-c/
And also here:
http://www.codeproject.com/Articles/109026/Chained-null-checks-and-the-Maybe-monad
http://mikehadlow.blogspot.co.uk/2011/01/monads-in-c-5-maybe.html
It's related to but not quite the same as what you seem to need; however, perhaps it can be adapted to your needs. The concepts are fairly fundamental.
I've written a wrapper class around a 3rd party library that requires properties to be set by calling a Config method and passing a string formatted as "Property=Value"
I'd like to pass all the properties in a single call and process them iteratively.
I've considered the following:
creating a property/value class and then creating a List of these
objects
building a string of multiple "Property=Value" separating them
with a token (maybe "|")
Using a hash table
All of these would work (and I'm thinking of using option 1) but is there a better way?
A bit more detail about my query:
The finished class will be included in a library for re-use in other applications. Whilst I don't currently see threading as a problem at the moment (our apps tend to just have a UI thread and a worker thread) it could become an issue in the future.
Garbage collection will not be an issue.
Access to arbitrary indices of the data source is not currently an issue.
Optimization is not currently an issue but clearly define the key/value pairs is important.
As you've already pointed out, any of the proposed solutions will accomplish the task as you've described it. What this means is that the only rational way to choose a particular method is to define your requirements:
Does your code need to support multiple threads accessing the data source simultaneously? If so, using a ConcurrentDictionary, as Yahia suggested, makes sense. Otherwise, there's no reason to incur the additional overhead and complexity of using a concurrent data structure.
Are you working in an environment where garbage collection is a problem (for example, an XNA game)? If so, any suggestion involving the concatenation of strings is going to be problematic.
Do you need O(1) access to arbitrary indices of the data source? If so, your third approach makes sense. On the other hand, if all you're doing is iterating over the collection, there's no reason to incur the additional overhead of inserting into a hashtable; use a List<KeyValuePair<String, String>> instead.
On the other hand, you may not be working in an environment where the optimization described above is necessary; the ability to clearly define the key/value pairs programatically may be more important to you. In which case using a Dictionary is a better choice.
You can't make an informed decision as to how to implement a feature without completely defining what the feature needs to do, and since you haven't done that, any answer given here will necessarily be incomplete.
Given your clarifications, I would personally suggest the following:
Avoid making your Config() method thread-safe by default, as per the MSDN guidelines:
By default, class libraries should not be thread safe. Adding locks to create thread-safe code decreases performance, increases lock contention, and creates the possibility for deadlock bugs to occur.
If thread safety becomes important later, make it the caller's responsibility.
Given that you don't have special performance requirements, stick with a dictionary to allow key/value pairs to be easily defined and read.
For simplicity's sake, and to avoid generating lots of unnecessary strings doing concatenations, just pass the dictionary in directly and iterate over it.
Consider the following example:
var configData = new Dictionary<String, String>
configData["key1"] = "value1";
configData["key2"] = "value2";
myLibraryObject.Config(configData);
And the implementation of Config:
public void Config(Dictionary<String, String> values)
{
foreach(var kvp in values)
{
var configString = String.Format("{0}={1}", kvp.Key, kvp.Value);
// do whatever
}
}
You could use Dictionary<string,string>, the items are then of type KeyValuePair<string,string> (this correpsonds to your first idea)
You can then use myDict.Select(kvp=>string.Format("{0}={1}",kvp.Key,kvp.Value)) to get a list of strings with the needed formatting
Use for example a ConcurrentDictionary<string,string> - it is thread-safe and really fast since most operations are implemented lock-free...
You could make a helper class that uses reflection to turn any class into a Property=Value collection
public static class PropertyValueHelper
{
public static IEnumerable<string> GetPropertyValues(object source)
{
Type t = source.GetType();
foreach (var property in t.GetProperties())
{
object value = property.GetValue(source, null);
if (value != null)
{
yield return property.Name + "=" + value.ToString();
}
else
{
yield return property.Name + "=";
}
}
}
}
You would need to add extra logic to handle enumerations, indexed properties, etc.
I have a huge dictionary of blank values in a variable called current like so:
struct movieuser {blah blah blah}
Dictionary<movieuser, float> questions = new Dictionary<movieuser, float>();
So I am looping through this dictionary and need to fill in the "answers", like so:
for(var k = questions.Keys.GetEnumerator();k.MoveNext(); )
{
questions[k.Current] = retrieveGuess(k.Current.userID, k.Current.movieID);
}
Now, this doesn't work, because I get an InvalidOperationException from trying to modify the dictionary I am looping through. However, you can see that the code should work fine - since I am not adding or deleting any values, just modifying the value. I understand, however, why it is afraid of my attempting this.
What is the preferred way of doing this? I can't figure out a way to loop through a dictionary WITHOUT using iterators.
I don't really want to create a copy of the whole array, since it is a lot of data and will eat up my ram like its still Thanksgiving.
Thanks,
Dave
Matt's answer, getting the keys first, separately is the right way to go. Yes, there'll be some redundancy - but it will work. I'd take a working program which is easy to debug and maintain over an efficient program which either won't work or is hard to maintain any day.
Don't forget that if you make MovieUser a reference type, the array will only be the size of as many references as you've got users - that's pretty small. A million users will only take up 4MB or 8MB on x64. How many users have you really got?
Your code should therefore be something like:
IEnumerable<MovieUser> users = RetrieveUsers();
IDictionary<MovieUser, float> questions = new Dictionary<MovieUser, float>();
foreach (MovieUser user in users)
{
questions[user] = RetrieveGuess(user);
}
If you're using .NET 3.5 (and can therefore use LINQ), it's even easier:
IDictionary<MovieUser, float> questions =
RetrieveUsers.ToDictionary(user => user, user => RetrieveGuess(user));
Note that if RetrieveUsers() can stream the list of users from its source (e.g. a file) then it will be efficient anyway, as you never need to know about more than one of them at a time while you're populating the dictionary.
A few comments on the rest of your code:
Code conventions matter. Capitalise the names of your types and methods to fit in with other .NET code.
You're not calling Dispose on the IEnumerator<T> produced by the call to GetEnumerator. If you just use foreach your code will be simpler and safer.
MovieUser should almost certainly be a class. Do you have a genuinely good reason for making it a struct?
Is there any reason you can't just populate the dictionary with both keys and values at the same time?
foreach(var key in someListOfKeys)
{
questions.Add(key, retrieveGuess(key.userID, key.movieID);
}
store the dictionary keys in a temporary collection then loop over the temp collection and use the key value as your indexer parameter. This should get you around the exception.
My most used mini pattern is:
VideoLookup = new ArrayList { new ArrayList { buttonVideo1, "Video01.flv" },
new ArrayList { buttonVideo2, "Video02.flv" },
new ArrayList { buttonVideo3, "Video03.flv" },
new ArrayList { buttonVideo4, "Video04.flv" },
new ArrayList { buttonVideo4, "Video04.flv" }
};
This means that rather than a switch statement with a case for each button I can instead just compare the button that was clicked with each item in the ArrayList. Then when I've found a match I launch the correct file (although the action that's the 2nd part the "lookup" could be a delegate or anything else).
The main benefit is that I don't have the problem of remembering to add all the correct code for each switch statement case, I just add a new item to the lookup ArrayList.
(Yes I know using an ArrayList isn't the best way to go, but it's old code. And I know that looping through an array each time isn't as efficient as using a switch statement, but this code isn't in a tight loop)
Does anyone else have any mini-patterns they use that save time/effort or make code more readable? They don't have to just be GUI related.
Update: Don't copy this code, I knew it was bad, but I didn't realise how bad. Use something like this instead.
Hashtable PlayerLookup = new Hashtable();
PlayerLookup.Add(buttonVideo1, "Video01.flv");
PlayerLookup.Add(buttonVideo2, "Video02.flv");
PlayerLookup.Add(buttonVideo3, "Video03.flv");
PlayerLookup.Add(buttonVideo4, "Video04.flv");
string fileName = PlayerLookup[currentButton].ToString();
please please please omg use this version.
VideoLookup = new Dictionary<Button, string> {
{ buttonVideo1, "Video01.flv" },
{ buttonVideo2, "Video02.flv" },
{ buttonVideo3, "Video03.flv" },
{ buttonVideo4, "Video04.flv" },
{ buttonVideo4, "Video04.flv" }
};
You could just create a struct or object that has a button reference and a string representing the file name and then a List of these things. Or, you could just use a Dictionary and make it even easier on yourself. Lots of ways to improve. :)
On the subject of switches, I write this kind of thing a lot:
public Object createSomething(String param)
{
return s == null ? new NullObject() :
s.equals("foo") ? new Foo() :
s.equals("bar") ? new Bar() :
s.equals("baz") || s.equals("car") ? new BazCar() :
new Object();
}
I think it looks more readable compared to regular switch statements and has the ability to have more complex comparisons. Yeah, it'll be slower because you need to compare each condition but 99% of the time that doesn't matter.
In Java, I sometimes find that private inner classes which implement a public interface can be very helpful for objects composed of tightly-coupled elements. I've seen this mini-pattern (idiom) discussed in the context of creating UIs with Allen Holub's Visual Proxy architecture, but not much beyond that. As far as I know it doesn't have a name.
For example, let's say you have a Collection interface that can provide an Iterator:
public interface Collection
{
...
public Iterator iterate();
}
public interface Iterator
{
public boolean hasNext();
public Object next();
}
If you have a Stack that implements Collection, then you could implement its Iterator as a private inner class:
public class Stack implements Collection
{
...
public Iterator iterate()
{
return new IteratorImpl();
}
private class IteratorImpl implements Iterator
{
public boolean hasNext() { ... }
public Object next() { ... }
}
}
Stack.IteratorImpl has complete access to all of Stack's private methods and fields. At the same time, Stack.IteratorImpl is invisible to all except Stack.
A Stack and its Iterator will tend to be tightly coupled. Worst case, implementing Stack's Iterator as a public class might force you to break Stack's encapsulation. The private inner class lets you avoid this. Either way, you avoid polluting the class hierarchy with something that's really an implementation detail.
In my last job I wrote a C# version of the Enforcements concept introduced in C++ by Andrei Alexandrescu and Petru Marginean (original article here).
This is really cool because it lets you interweave error handling or condition checking in with normal code without breaking the flow - e.g.:
string text = Enforce.NotNull( myObj.SomeMethodThatGetsAString(), "method returned NULL" );
This would check if the first argument is null, throw an EnforcementException with the second argument as the message if it is, or return the first argument otherwise. There are overloads that take string formatting params too, as well as overloads that let you specify a different exception type.
You could argue that this sort of thing is less relevant in C# because the runtime checking is better and already quite informative - but this idiom lets you check closer to the source and provide more information, while remaining expressive.
I use the same system for Pre and Post condition checking.
I might write an Open Source version and link it from here.
for when I'm churning out code fast (deadlines! deadlines! why am I on stackoverflow.com? deadlines!), I wind up with this kind code:
Button1.Click += (o,e) => { DoSomething(foo); };
Will this cause me memory leaks at some point? I'm not sure! This probably deserves a question. Ack! Deadlines!
For Windows forms I'll often use the Tag field to put a psuedo-command string so that I can have a single event handler for a shared set of buttons. This works especially well for buttons that do pretty much the same thing but are parameterized.
In your first example, I would set the Tag for the buttons equal to the name of the video file -- no lookup required.
For applications that have some form of text-based command processor for dispatching actions, the Tag is a string that is just fed into the command processor. Works nice.
(BTW: I've seen the term "idiom" used for mini-patterns...)
A new idiom that I'm beginning to see in C# is the use of closure parameters that encapsulate some configuration or setup that the method will need to function. This way, you can control the relative order that code must run from within your method.
This is called a nested closure by Martin Fowler: http://www.martinfowler.com/dslwip/NestedClosure.html
Perhaps there's already a better way of doing this (vbEx2005/.Net2.0), but I've found it useful to have a class of generic delegate-creators which accept a method that takes some parameters, along with the values of either all, or all but one, of those parameters, and yields a delegate which, when invoked, will call the specified function with the indicated parameters. Unlike ParamArray-based things like ParameterizedThreadStart, everything is type-safe.
For example, if I say:
Sub Foo(param1 As Integer, param2 As String)
...
End Sub
...
Dim theAct as Action(of Integer) = _
ActionOf(of Integer).NewInv(AddressOf Foo,"Hello there")
theAct(5)
...
the result will be to call Foo(5, "Hello there") on object where Foo was declared. Unfortunately, I end up having to have separate generic classes and methods for every different number of parameters I want to support, but it's nicer to have all the cut-and-paste in one file than to have extra code scattered about everywhere to create the appropriate delegates.