Converting C# to Java, how do I handle IEnumerable? - c#

I've been tasked with converting some C# code to Java. Most of it is fine, but I am having some trouble working out how to translate IEnumerable.
The code I have in C# is this:
public IEnumerable<Cat> Reorder(IEnumerable<Cat> catList)
{
// some logic that reorders the list
}
My googling suggested that I should be using Iterable<Cat> as an alternative. However, I also stumbled upon something saying you should never have Iterable<T> as a return type.
I'm a bit unfamiliar with data structures in Java. What should I be returning, and how would you re-order a collection of objects?
In C#, assuming you don't use linq, you'd create an empty array or List or similar, and add the items in as you repeatedly iterate through them, checking the criteria. Which data structure would I use in Java to achieve this?

It depends a bit on what you want to do with the return value.
Java has no LINQ, so using an Iterable<T> other than inside a foreach loop is a bit of a PITA. This blog post describes it in more depth.
The alternative is to return a Collection<T>.
Having said that, returning an Iterable<T> is not wrong, it just makes it harder to consume the return value in certain scenarios.
In Java you would use an implementation of List<T> like ArrayList<T> for temporary instances inside methods. When you would want to return that instance from a method, the return type would be the interface List<T> or Collection<T>.

You could do something like Collections.sort(list); if you implement a Interface Comparable at your objects (similiar can be done with c# and the IComparer)
"add the items in as you repeatedly iterate through them," I hope you don't really mean what I'm thinking you mean... There are a hell lot of sorting algorithms.

There's nothing wrong with Iterable, when you need to use it.
However, considering what the code above tells me, I think you'd be better going with java.util.Collection or java.util.List (the latter if catList is definitely a list of cats).
The way of reordering actually depends on the ordering requirements. It's probably as simple as using Collections.sort(List list)

You should identify the data structure you need.
Basically the 3 great data structure families are:
List: An ordered index-based collection;
Set: A collection that contains no duplicate elements;
Map: A key-value based collection (A dictionnary in C#). Not adapted here in this precise case.
Depending on what you need, you should return List<Cat> or Set<Cat>.

Related

Get original value from HashSet

UPDATE:
Starting with .Net 4.7.2, HashSet.TryGetValue - docs is available.
HashSet.TryGetValue - SO post
I have a problem with HashSet because it does not provide any method similar to TryGetValue known from Dictionary. And I need such method -- passing element to find in the set, and set returning element from its collection (when found).
Sidenote -- "why do you need element from the set, you already have that element?". No, I don't, equality and identity are two different things.
HashSet is not sealed but all its fields are private, so deriving from it is pointless. I cannot use Dictionary instead because I need SetEquals method. I was thinking about grabbing a source for HashSet and adding desired method, but the license is not truly open source (I can look, but I cannot distribute/modify). I could use reflection but the arrays in HashSet are not readonly meaning I cannot bind to those fields once per instance lifetime.
And I don't want to use full blown library for just single class.
So far I am stuck with LINQ SingleOrDefault. So the question is how fix this -- have HashSet with TryGetValue?
Probably you should switch from a HashSet to a SortedSet
There is a simple TryGetValue() for a SortedSet:
public bool TryGetValue(ref T element)
{
var foundSet = sortedSet.GetViewBetween(element, element);
if(foundSet.Count == 1)
{
element = foundSet.First();
return true;
}
return false;
}
when called, the element needs just all properties set which are used in the Comparer. It returns the element found in the Set.
I agree this is something which is basically missing. While it's only useful in rare cases, I think they're significant rare cases - most notable, key canonicalization.
I can only think of one suggestion at the moment, and it's truly foul.
You can specify your own IEqualityComparer<T> when creating a HashSet<T> - so create one which remembers the arguments to the last positive (i.e. true-returning) Equals comparison it has performed. You can then call Contains, and see what the equality comparer was asked to compare.
Caveats:
This holds on to references unnecessarily, so could end up preventing objects being garbage collected
You'd potentially want to do this on a per-thread basis (if you've got a set that isn't modified after initialization, but is then read by multiple threads, for example)
It assumes that HashSet<T> doesn't use any optimization such as "if the references are equal, don't bother consulting the equality comparer"
It's fundamentally a horrible abuse
I've been trying to think of other alternatives in terms of finding intersections, but I haven't got anywhere yet...
As noted in comments, it would be worth encapsulating this as far as possible - I suspect you only need a very limited set of operations, so I'd wrap a HashSet<T> in your own class and only expose the operations you really need - that way you get to clear the "cache" after each operation, removing my first objection above.
It still feels like a horrible abuse to me, but...
As others have suggested, an alternative would be to use a Dictionary<TKey, TValue> and implement SetEquals yourself. That would be simple enough to do - and again, you'd want to encapsulate this in your own type. Either way, you should probably design the type itself first, and then implement it using either a HashSet<> or a Dictionary<,> as an implementation detail.
Sounds like you trying to use the wrong tool. True, you can save some memory using a HashSet but it seems to me that you are trying to acheeve a different goal: Get the actual element that is just equal to a representation.
So in reality they are two different elements. Just the memento (a unique representation) is equal.
Therefore you'd be better of using a Dictionary where you add your elements as Key and Value. So you're able to get it back (the identical) but you miss your SetEquals....
I suppose SetEquals in it's implementation does nothing much different than sequencially compare two HashSets in it's bucket order and fails on first non-equality.
So you should be equally good off using a simple SequenceEqual() (LINQ) comparing the two Keys collections.
So this extension method could do
public static SetEqual<T,G>(this IDictionary<T,G> d, IDictionary<T,G> e)
{
return d.Keys.SequenceEqual(e.Keys);
}
This should work, because a Dictionary basically is a HashSet with an associated value. And more appropriate to your problem. (OK, to be correct, the code should go for Dictionary<> instead of IDictionary<> because Key order matters)
If you need an IEnumerable<> on the second parameter try sorting to get a defined order (not so efficient).
Finally added in .NET 4.7.2:
HashSet.TryGetValue(T, T) Method
An SO post with more details
hopefully not blind but I haven't seen this answer anywhere. If you want dictionary's TryGetValue, you can just steal it.
theHashset.ToDictionary(item => item.ID).TryGetValue(key, out value)
All you need is a quick lambda for determining unique keys.

Select and ForEach on List<> [duplicate]

This question already has answers here:
LINQ equivalent of foreach for IEnumerable<T>
(22 answers)
Closed 9 years ago.
I am quite new to C# and was trying to use lambda expressions.
I am having a list of object. I would like to select item from the list and perform foreach operation on the selected items. I know i could do it without using lambda expression but wanted to if this was possible using lambda expression.
So i was trying to achieve a similar result
List<UserProfile> users = new List<UserProfile>();
..load users with list of users
List<UserProfile> selecteditem = users.Where(i => i.UserName=="").ToList();
foreach(UserProfile item in selecteditem)
{
item.UserName = "NA";
}
it was possible to do
users.Where(i => i.UserName=="").ToList().ForEach(i=>i.UserName="NA");
but not something like this
users.select(i => i.UserName=="").ForEach(i=>i.UserName="NA");
Can someone explain this behaviour..
Let's start here:
I am having a list of object.
It's important to understand that, while accurate, that statement leaves a c# programmer wanting more. What kind of object? In the .Net world, it pays to always keep in mind what specific type of object you are working with. In this case, that type is UserProfile. This may seem like a side issue, but it will become more relevant to the specific question very quickly. What you want to say instead is this:
I have a list of UserProfile objects.
Now let's look at your two expressions:
users.Where(i => i.UserName=="").ToList().ForEach(i=>i.UserName="NA");
and
users.Where(i => i.UserName=="").ForEach(i=>i.UserName="NA");
The difference (aside from that only the first compiles or works) is that you need to call .ToList() to convert the results of Where() function to a List type. Now we begin to see why it is that you want to always think in terms of types when working with .Net code, because it should now occur to you to wonder, "What type am I working with, then?" I'm glad you asked.
The .Where() function results in an IEnumerable<T> type, which is actually not a full type all by itself. It's an interface that describes certain things a type that implements it's contract will be able to do. The IEnumerable interface can be confusing at first, but the important thing to remember is that it defines something that you can use with a foreach loop. That is it's sole purpose. Anything in .Net that you can use with a foreach loop: arrays, lists, collections — they pretty much all implement the IEnumerable interface. There are other things you can loop over, as well. Strings, for example. Many methods you have today that require a List or Array as an argument can be made more powerful and flexible simply by changing that argument type to IEnumerable.
.Net also makes it easy to create state machine-based iterators that will work with this interface. This is especially useful for creating objects that don't themselves hold any items, but do know how to loop over items in a different collection in a specific way. For example, I might loop over just items 3 through 12 in an array of size 20. Or might loop over the items in alphabetical order. The important thing here is that I can do this without needing to copy or duplicate the originals. This makes it very efficient in terms of memory, and it's structure in such a way that you can easily compose different iterators together to get very powerful results.
The IEnumerable<T> type is especially important, because it is one of two types (the other being IQueryable) that form the core of the linq system. Most of the .Where(), .Select(), .Any() etc linq operators you can use are defined as extensions to IEnumerable.
But now we come to an exception: ForEach(). This method is not part of IEnumerable. It is defined directly as part of the List<T> type. So, we see again that it's important to understand what type you are working with at all times, including the results of each of the different expressions that make up a complete statement.
It's also instructional to go into why this particular method is not part of IEnumerable directly. I believe the answer lies in the fact that the linq system takes a lot of inspiration from a the Functional Programming world. In functional programming, you want to have operations (functions) that do exactly one thing, with no side effects. Ideally, these functions will not alter the original data, but rather they will return new data. The ForEach() method is implicitly all about creating bad side effects that alter data. It's just bad functional style. Additionally, ForEach() breaks method chaining, in that it doesn't return a new IEnumerable.
There is one more lesson to learn here. Let's take a look at your original snippet:
List<UserProfile> users = new List<UserProfile>();
// ..load users with list of users
List<UserProfile> selecteditem = users.Where(i => i.UserName=="").ToList();
foreach(UserProfile item in selecteditem)
{
item.UserName = "NA";
}
I mentioned something earlier that should help you significantly improve this code. Remember that bit about how you can have IEnumerable items that loop over a collection, without duplicating it? Think about what happens if you wrote that code this way, instead:
List<UserProfile> users = new List<UserProfile>();
// ..load users with list of users
var selecteditem = users.Where(i => i.UserName=="");
foreach(UserProfile item in selecteditem)
{
item.UserName = "NA";
}
All I did was remove the call to .ToList(), but everything will still work. The only thing that changed is we avoided needing to copy the entire list. That should make this code faster. In some circumstances, it can make the code a lot faster. Something to keep in mind: when working the with the linq operator methods, it's generally good to avoid calling .ToArray() or .ToList() whenever possible, and it's possible a lot more than you might think.
As for the foreach() {...} vs .Foreach( ... ): the former is still perfectly appropriate style.
Sure, it's quite simple. List has a ForEach method. There is no such method, or extension method, for IEnumerable.
As to why one has a method and another doesn't, that's an opinion. Eric Lippert blogged on the topic if you're interested in his.

C# dictionary vs list usage

I had two questions. I was wondering if there is an easy class in the C# library that stores pairs of values instead of just one, so that I can store a class and an integer in the same node of the list. I think the easiest way is to just make a container class, but as this is extra work each time. I wanted to know whether I should be doing so or not. I know that in later versions of .NET ( i am using 3.5) that there are tuples that I can store, but that's not available to me.
I guess the bigger question is what are the memory disadvantages of using a dictionary to store the integer class map even though I don't need to access in O(1) and could afford to just search the list? What is the minimum size of the hash table? should i just make the wrapper class I need?
If you need to store an unordered list of {integer, value}, then I would suggest making the wrapper class. If you need a data structure in which you can look up integer to get value (or, look up value to get integer), then I would suggest a dictionary.
The decision of List<Tuple<T1, T2>> (or List<KeyValuePair<T1, T2>>) vs Dictionary<T1, T2> is largely going to come down to what you want to do with it.
If you're going to be storing information and then iterating over it, without needing to do frequent lookups based on a particular key value, then a List is probably what you want. Depending on how you're going to use it, a LinkedList might be even better - slightly higher memory overheads, faster content manipulation (add/remove) operations.
On the other hand, if you're going to be primarily using the first value as a key to do frequent lookups, then a Dictionary is designed specifically for this purpose. Key value searching and comparison is significantly improved, so if you do much with the keys and your list is big a Dictionary will give you a big speed boost.
Data size is important to the decision. If you're talking about a couple hundred items or less, a List is probably fine. Above that point the lookup times will probably impact more significantly on execution time, so Dictionary might be more worth it.
There are no hard and fast rules. Every use case is different, so you'll have to balance your requirements against the overheads.
You can use a list of KeyValuePair:http://msdn.microsoft.com/en-us/library/5tbh8a42.aspx
You can use a Tuple<T,T1>, a list of KeyValuePair<T, T1> - or, an anonymous type, e.g.
var list = something.Select(x => new { Key = x.Something, Value = x.Value });
You can use either KeyValuePair or Tuple
For Tuple, you can read the following useful post:
What requirement was the tuple designed to solve?

Methods: What is better List or object?

While I was programming I came up with this question,
What is better, having a method accept a single entity or a List of those entity's?
For example I need a List of strings. I can either have:
a method accepting a List and return a List of strings with the results.
List<string> results = methodwithlist(List[objects]);
or
a method accepting a object and return a string. Then use this function in a loop and so filling a list.
for int i = 0; i < List<objects>.Count;i++;)
{
results = methodwithsingleobject(List<objects>[i]);
}
** This is just a example. I need to know which one is better, or more used and why.
Thanks!
Well, it's easy to build the first form when you've got the second - but using LINQ, you really don't need to write your own, once you've got the projection. For example, you could write:
List<string> results = objectList.Select(X => MethodWithSingleObject()).ToList();
Generally it's easier to write and test a method which only deals with a single value, unless it actually needs to know the rest of the values in the collection (e.g. to find aggregates).
I would choose the second because it's easier to use when you have a single string (i.e. it's more general purpose). Also, the responsibility of the method itself is more clear because the method should not have anything to do with lists if it's purpose is just to modify a string.
Also, you can simplify the call with Linq:
result = yourList.Select(p => methodwithsingleobject(p));
This question comes up a lot when learning any language, the answer is somewhat moot since the standard coding practice is to rely upon LINQ to optimize the code for you at runtime. But this presumes you're using a version of the language that supports it. But if you do want to do some research on this there are a few Stack Overflow articles that delve into this and also give external resources to review:
In .NET, which loop runs faster, 'for' or 'foreach'?
C#, For Loops, and speed test... Exact same loop faster second time around?
What I have learned, though, is not to rely too heavily on Count and to use Length on typed Collections as that can be a lot faster.
Hope this is helpful.

IEnumerable<T> as return type

Is there a problem with using IEnumerable<T> as a return type?
FxCop complains about returning List<T> (it advises returning Collection<T> instead).
Well, I've always been guided by a rule "accept the least you can, but return the maximum."
From this point of view, returning IEnumerable<T> is a bad thing, but what should I do when I want to use "lazy retrieval"? Also, the yield keyword is such a goodie.
This is really a two part question.
1) Is there inherently anything wrong with returning an IEnumerable<T>
No nothing at all. In fact if you are using C# iterators this is the expected behavior. Converting it to a List<T> or another collection class pre-emptively is not a good idea. Doing so is making an assumption on the usage pattern by your caller. I find it's not a good idea to assume anything about the caller. They may have good reasons why they want an IEnumerable<T>. Perhaps they want to convert it to a completely different collection hierarchy (in which case a conversion to List is wasted).
2) Are there any circumstances where it may be preferable to return something other than IEnumerable<T>?
Yes. While it's not a great idea to assume much about your callers, it's perfectly okay to make decisions based on your own behavior. Imagine a scenario where you had a multi-threaded object which was queueing up requests into an object that was constantly being updated. In this case returning a raw IEnumerable<T> is irresponsible. As soon as the collection is modified the enumerable is invalidated and will cause an execption to occur. Instead you could take a snapshot of the structure and return that value. Say in a List<T> form. In this case I would just return the object as the direct structure (or interface).
This is certainly the rarer case though.
No, IEnumerable<T> is a good thing to return here, since all you are promising is "a sequence of (typed) values". Ideal for LINQ etc, and perfectly usable.
The caller can easily put this data into a list (or whatever) - especially with LINQ (ToList, ToArray, etc).
This approach allows you to lazily spool back values, rather than having to buffer all the data. Definitely a goodie. I wrote-up another useful IEnumerable<T> trick the other day, too.
IEnumerable is fine by me but it has some drawbacks. The client has to enumerate to get the results. It has no way to check for Count etc.
List is bad because you expose too much control; the client can add/remove etc. from it and that can be a bad thing.
Collection seems the best compromise, at least in FxCop's view.
I always use what seems appropiate in my context (eg. if I want to return a read only collection I expose collection as return type and return List.AsReadOnly() or IEnumerable for lazy evaluation through yield etc.). Take it on a case by case basis
About your principle: "accept the least you can, but return the maximum".
The key to managing the complexity of a large program is a technique called information hiding. If your method works by building a List<T>, it's not often necessary to reveal this fact by returning that type. If you do, then your callers may modify the list they get back. This removes your ability to do caching, or lazy iteration with yield return.
So a better principle is for a function to follow is: "reveal as little as possible about how you work".
Returning IEnumerable<T> is OK if you're genuinely only returning an enumeration, and it will be consumed by your caller as such.
But as others point out, it has the drawback that the caller may need to enumerate if he needs any other info (for example Count). The .NET 3.5 extension method IEnumerable<T>.Count will enumerate behind the scenes if the return value does not implement ICollection<T>, which may be undesirable.
I often return IList<T> or ICollection<T> when the result is a collection - internally your method can use a List<T> and either return it as-is, or return List<T>.AsReadOnly if you want to protect against modification (e.g. if you're caching the list internally). AFAIK FxCop is quite happy with either of these.
"accept the least you can, but return the maximum" is what I advocate. When a method returns an object, what justifications we have to not return the actual type and limit the capabilities of the object by returning a base type. This however raises a question how do we know what the "maximum" (actual type) will be when we design an interface. The answer is very simple. Only in extreme cases where the interface designer is designing an open interface, which will be implemented outside the application/component, they would not know what the actual return type may be. A smart designer should always consider what the method should be doing and what an optimal/generic return type should be.
E.g. If I am designing an interface to retrieve a vector of objects, and I know the count of returned objects are going to be variable, I'll always assume a smart developer will always use a List. If someone plans to return an Array, I'd question his capabilities, unless he/she is just returning the data from another layer that he/she doesn't own. And this is probably why FxCop advocates for ICollection (common base for List and Array).
The above being said, there are couple of other things to consider
if the returned data should be mutable or immutable
if the returned data be shared across multiple callers
Regarding the LINQ lazy evaluations I am sure 95%+ C# users don't understand the intestacies. It’s so non-oo-ish. OO promotes concrete state changes on method invocations. LINQ lazy evaluation promotes runtime state changes on expression evaluation pattern (not something non-advanced users always follow).
One important aspect is that when you return a List<T> you are actual returning a reference. That makes it possible for a caller to manipulate your list. This is a common problem—for instance, a Business layer that returns a List<T> to a GUI layer.
Just because you say you're returning IEnumerable doesn't mean you can't return a List. The idea is to reduce unneeded coupling. All that the caller should care about is getting a list of things, rather than the exact type of collection used to contain that list. If you have something that's backed by an array, then getting something like Count is going to be fast anyway.
I think your own guidance is great -- if you are able to be more specific about what you're returning without a performance hit (you don't have to e.g. build a List out of your result), do so. But if your function legitimately doesn't know what type it's going to find, like if in some situations you'll be working with a List and in some with an Array, etc., then returning IEnumerable is the "best" you can do. Think of it as the "greatest common multiple" of everything you might want to return.
I can't accept the chosen answer. There are ways of dealing with the scenario described but using a List or whatever else your using isn't one of them. The moment the IEnumerable is returned you have to assume that the caller might do a foreach. In that case it doesn't matter if the concrete type is List or spaghetti. In fact just indexing is a problem especially if items are removed.
Any returned value is a snapshot. It may be the current contents of the IEnumerable in which case if it's cached it should be a clone of the cached copy; if it's supposed to be more dynamic (like the resuts of a sql query) then use yield return; however allowing the container to mutate at will and supplying methods like Count and indexer is a recipe for disaster in a multithreaded world. I haven't even gotten into the ability of the caller to call Add or Delete on a container your code is supposed to be in control of.
Also returning a concrete type locks you into an implementation. Today internally you may be using a list. Tomorrow maybe you do become multithreaded and want to use a thread safe container or an array or a queue or the Values collection of a dictionary or the output of a Linq query. If you lock yourself into a concrete return type then you have to either change a bunch of code or do a conversions before returning.
IEnumerable is cool because you can use the yield iterator that gives to the consumer just the data they need but there is a cost hidden in the construct.
Let me explain it with an example. Let's say I am consuming this method:
IEnumerable GetFilesFromFolder(string path)
So, what do I get? To get all the files of my folder I have to iterate the enumeration, and that's fine, after all that's how enumerations work, but what if, for any reason, I have to enumerate it twice?
The second time should I expect a refreshed result or the result is idempotent? I do not know. I have to check the docs of the library / method.
The call to the GetEnumerator method of the enumeration done by the consumer, could, in fact, execute an I/O operation behind the scene, or an http call, or it could simply iterate an inner array, I can not know it for sure. I have to check the docs in the hope that this behavior is documented.
Does this detail matters? I think it does. At least from a performance perspective.
Even if the cost of iterations is slow and CPU bounded, this is not zero, and it could go even worse in the scenario of chains of enumerations, that often turn debugging sessions a nightmare.
I prefer to not give the consumer of my library doubts so whenever I know my API returns few elements I always use arrays as return type, and only when the data to return is huge I use IEnumerable or IAsyncEnumerable.
Anyway, if you want to return enumerations please document your API to tell consumers if the result is a snapshot or not.

Categories

Resources