Fast way to check if IEnumerable<T> contains no duplicates (= is distinct) - c#

Is there a fast built-in way to check if an IEnumerable<string> contains only distinct strings?
In the beginning I started with:
var enumAsArray = enum.ToArray();
if (enumAsArray.Length != enumAsArray.Distinct().Count())
throw ...
However, this looks like it is O(2n) - is it? ToArray() might be O(1)?
This looks faster:
var set = new HashSet<string>();
foreach (var str in enum)
{
if (!set.Add(str))
throw ...
}
This should be O(n), however, is there a built-in way too?
Edit: Maybe Distinct() uses this internally?
Solution:
After considering all the comments and the answer, I wrote an extension method for my second solution, as this seems to be the fastest version and the most readable too:
public static bool ContainsDuplicates<T>(this IEnumerable<T> e)
{
var set = new HashSet<T>();
// ReSharper disable LoopCanBeConvertedToQuery
foreach (var item in e)
// ReSharper restore LoopCanBeConvertedToQuery
{
if (!set.Add(item))
return true;
}
return false;
}

Your second code sample is short, simple, clearly effective, and if not the completely perfect ideal solution, is clearly rather close to it. It seems like a perfectly acceptable solution to your particular problems.
Unless your use of that particular solution is shown to cause performance problems after you've noticed issues and done performance testing, I'd leave it as is. Given how little room I can see for improvement in general, that doesn't seem likely. It's not a sufficiently lengthy or complex solution that trying to find something "shorter" or more concise is going to be worth your time and effort.
In short, there are almost certainly better places in your code to spend your time; what you have already is fine.
To answer your specific questions:
However, this looks like it is O(2n) - is it?
Yes, it is.
ToArray() might be O(1)?
No, it's not.
Maybe Distinct() uses this internally?
It does use a HashSet, and it looks pretty similar, but it simply ignores duplicate items; it doesn't provide any indication to the caller that it has just passed a duplicate item. As a result, you need to iterate the whole sequence twice to see if it removed anything, rather than stopping when the first duplicate is encountered. This is the difference between something that always iterates the full sequence twice and something that might iterate the full sequence once, but can short circuit and stop as soon as it has ensured an answer.
is there a built-in way too?
Well, you showed one, it's just not as efficient. I can think of no entire LINQ based solution as efficient as what you showed. The best I can think of would be: data.Except(data).Any(). This is a bit better than your distinct compared to the regular count in that the second iteration can short circuit (but not the first) but it also iterates the sequence twice, and still is worse than your non-LINQ solution, so it's still not worth using.

Here is a possible refinement to the OP's answer:
public static IEnumerable<T> Duplicates<T>(this IEnumerable<T> e)
{
var set = new HashSet<T>();
// ReSharper disable LoopCanBeConvertedToQuery
foreach (var item in e)
// ReSharper restore LoopCanBeConvertedToQuery
{
if (!set.Add(item))
yield return item;
}
}
You now have a potentially useful method to get the actual duplicate items and you can answer your original question with:
collection.Duplicates().Any()

Just a complement to the existing solution:
public static bool ContainsDuplicates<T>(this IEnumerable<T> items)
{
return ContainsDuplicates(items, EqualityComparer<T>.Default);
}
public static bool ContainsDuplicates<T>(this IEnumerable<T> items, IEqualityComparer<T> equalityComparer)
{
var set = new HashSet<T>(equalityComparer);
foreach (var item in items)
{
if (!set.Add(item))
return true;
}
return false;
}
This version lets you pick an equality comparer, this may prove useful if you want to compare items based on non-default rules.
For instance, to compare a a set of strings case insensitively, just pass it StringComparer.OrdinalIgnoreCase.

Related

Any benefit of using yield in this case?

I am maintaining some code at work and the original author is gone so thought I would ask here to see if I can satisfy my curiosity.
Below is a bit of code (anonymized) where yield is being used. As far as I can tell it does not add any benefit and just returning a list would be sufficient, maybe more readable as well (for me at least). Just wondering if I am missing something because this pattern is repeated in a couple of places in the code base.
public virtual IEnumerable<string> ValidProductTypes
{
get
{
yield return ProductTypes.Type1;
yield return ProductTypes.Type2;
yield return ProductTypes.Type3;
}
}
This property is used as a parameter for some class which just uses it to populate a collection:
var productManager = new ProductManager(ValidProductTypes);
public ProductManager(IEnumerable<string> validProductTypes)
{
var myFilteredList = GetFilteredTypes(validProductTypes);
}
public ObservableCollection<ValidType> GetFilteredTypes(IEnumerable<string> validProductTypes)
{
var filteredList = validProductTypes
.Where(type => TypeIsValid); //TypeIsValid returns a ValidType
return new ObservableCollection<ValidType>(filteredList);
}
I'd say that returning an IEnumerable<T> and implementing that using yield return is the simplest option.
If you see that a method returns an IEnumerable<T>, there really is only one thing you can do with it: iterate it. Any more complicated operations on it (like using LINQ) are just encapsulated specific ways of iterating it.
If a method returns an array or list, you also gain the ability to mutate it and you might start wondering if that's an acceptable use of the API. For example, what happens if you do ValidProductTypes.Add("a new product")?
If you're talking just about the implementation, then the difference becomes much smaller. But the caller would still be able to cast the returned array or list from IEnumerable<T> to its concrete type and mutate that. The chance that anyone would actually think this was the intended use of the API is small, but with yield return, the chance is zero, because it's not possible.
Considering that I'd say the syntax has roughly the same complexity and ease of understanding, I think yield return is a reasonable choice. Though with C# 6.0 expression bodied properties, the syntax for arrays might get the upper hand:
public virtual IEnumerable<string> ValidProductTypes =>
new[] { ProductTypes.Type1, ProductTypes.Type2, ProductTypes.Type3 };
The above answer is assuming that this is not performance-critical code, so fairly small differences in performance won't matter. If this is performance-critical code, then you should measure which option is better. And you might also want to consider getting rid of allocations (probably by caching the array in a field, or something like that), which might be the dominant factor.

Basic array Any() vs Length

I have a simple array of objects:
Contact[] contacts = _contactService.GetAllContacts();
I want to test if that method returns any contacts. I really like the LINQ syntax for Any() as it highlights what I am trying to achieve:
if(!contacts.Any()) return;
However, is this slower than just testing the length of the array?
if(contacts.Length == 0) return;
Is there any way I can find out what kind of operation Any() performs in this instance without having to go to here to ask? Something like a Profiler, but for in-memory collections?
There are two Any() methods:
1. An extension method for IEnumerable<T>
2. An extension method for IQueryable<T>
I'm guessing that you're using the extension method for IEnumerable<T>. That one looks like this:
public static bool Any<T>(this IEnumerable<T> enumerable)
{
foreach (var item in enumerable)
{
return true;
}
return false;
}
Basically, using Length == 0 is faster because it doesn't involve creating an iterator for the array.
If you want to check out code that isn't yours (that is, code that has already been compiled), like Any<T>, you can use some kind of disassembler. Jetbrains has one for free - http://www.jetbrains.com/decompiler/
I have to completely disagree with the other answers. It certainly does not iterate over the array. It will be marginally slower, as it needs to create an array iterator object and call MoveNext() once, but that cost should be negligible in most scenarios; if Any() makes the code more readable to you, feel free to use it.
Source: Decompiled Enumerable.Any<TSource> code.
If you have a array the Length is in a property of the array. When calling Any you are iterate the array to find the first element. Setting up the enumerator is probably more expensive then just reading the Length property.
In your very case Length is slightly better:
// Just private field test if it's zero or not
if (contacts.Length == 0)
return;
// Linq overhead could be added: e.g. a for loop
// for (int i = 0; i < contains.Length; ++i)
// return true;
// plus inevitable private field test (i < contains.Length)
if (!contacts.Any())
return;
But the difference seems being negligible.
In general case, however, Any is better, because it stops on the first item found
// Itterates until 1st item is found
if (contains.Any(x => MyCondition(x)))
return;
// Itterates the entire collection
if (contains.Select(x => MyCondition(x)).Count() > 0)
return;
Yes, it is slower because it iterate over the elements.Using Length property is better. But still I don't think there is a significant difference because Any returns true as soon as it finds an item.

LINQ or foreach - style/readability and speed

I have a piece of code for some validation logic, which in generalized for goes like this:
private bool AllItemsAreSatisfactoryV1(IEnumerable<Source> collection)
{
foreach(var foo in collection)
{
Target target = SomeFancyLookup(foo);
if (!target.Satisfactory)
{
return false;
}
}
return true;
}
This works, is pretty easy to understand, and has early-out optimization. It is, however, pretty verbose. The main purpose of this question is what is considered readable and good style. I'm also interested in the performance; I'm a firm believer that premature {optimization, pessimization} is the root of all evil, and try to avoid micro-optimizing as well as introducing bottlenecks.
I'm pretty new to LINQ, so I'd like some comments on the two alternative versions I've come up with, as well as any other suggestions wrt. readability.
private bool AllItemsAreSatisfactoryV2(IEnumerable<Source> collection)
{
return null ==
(from foo in collection
where !(SomeFancyLookup(foo).Satisfactory)
select foo).First();
}
private bool AllItemsAreSatisfactoryV3(IEnumerable<Source> collection)
{
return !collection.Any(foo => !SomeFancyLookup(foo).Satisfactory);
}
I don't believe that V2 offers much over V1 in terms of readability, even if shorter. I find V3 to be clear & concise, but I'm not too fond of the Method().Property part; of course I could turn the lambda into a full delegate, but then it loses it's one-line elegance.
What I'd like comments on are:
Style - ever so subjective, but what do you feel is readable?
Performance - are any of these a definite no-no? As far as I understand, all three methods should early-out.
Debuggability - anything to consider?
Alternatives - anything goes.
Thanks in advance :)
I think All would be clearer:
private bool AllItemsAreSatisfactoryV1(IEnumerable<Source> collection)
{
return collection.Select(f => SomeFancyLookup(f)).All(t => t.Satisfactory);
}
I think it's unlikely using linq here would cause a performance problem over a regular foreach loop, although it would be straightforward to change if it did.
I personally have no problem with the style of V3, and that one would be my first choice. You're essentially looking through the list for any whose lookup is not satisfactory.
V2 is difficult to grasp the intent of, and in its current form will throw an exception (First() requires the source IEnumerable to not be empty; I think you're looking for FirstOrDefault()). Why not just tack Any() on the end instead of comparing a result from the list to null?
V1 is fine, if a bit loquacious, and probably the easiest to debug, as I've found debugging lambdas to be a bit persnickety at times. You can remove the inner braces to lose some whitespace without sacrificing readability.
Really, all three will boil down into very similar opcodes; iterate through collection, call SomeFancyLookup(), and check a property of its return value; get out on the first failure. Any() "hides" a very similar foreach algorithm. The difference between V1 and all others is the use of a named variable, which MIGHT be a little less performant, but you have a reference to a Target in all three cases so I doubt it's significant, if a difference even exists.

foreach vs someList.ForEach(){}

There are apparently many ways to iterate over a collection. Curious if there are any differences, or why you'd use one way over the other.
First type:
List<string> someList = <some way to init>
foreach(string s in someList) {
<process the string>
}
Other Way:
List<string> someList = <some way to init>
someList.ForEach(delegate(string s) {
<process the string>
});
I suppose off the top of my head, that instead of the anonymous delegate I use above, you'd have a reusable delegate you could specify...
There is one important, and useful, distinction between the two.
Because .ForEach uses a for loop to iterate the collection, this is valid (edit: prior to .net 4.5 - the implementation changed and they both throw):
someList.ForEach(x => { if(x.RemoveMe) someList.Remove(x); });
whereas foreach uses an enumerator, so this is not valid:
foreach(var item in someList)
if(item.RemoveMe) someList.Remove(item);
tl;dr: Do NOT copypaste this code into your application!
These examples aren't best practice, they are just to demonstrate the differences between ForEach() and foreach.
Removing items from a list within a for loop can have side effects. The most common one is described in the comments to this question.
Generally, if you are looking to remove multiple items from a list, you would want to separate the determination of which items to remove from the actual removal. It doesn't keep your code compact, but it guarantees that you do not miss any items.
We had some code here (in VS2005 and C#2.0) where the previous engineers went out of their way to use list.ForEach( delegate(item) { foo;}); instead of foreach(item in list) {foo; }; for all the code that they wrote. e.g. a block of code for reading rows from a dataReader.
I still don't know exactly why they did this.
The drawbacks of list.ForEach() are:
It is more verbose in C# 2.0. However, in C# 3 onwards, you can use the "=>" syntax to make some nicely terse expressions.
It is less familiar. People who have to maintain this code will wonder why you did it that way. It took me awhile to decide that there wasn't any reason, except maybe to make the writer seem clever (the quality of the rest of the code undermined that). It was also less readable, with the "})" at the end of the delegate code block.
See also Bill Wagner's book "Effective C#: 50 Specific Ways to Improve Your C#" where he talks about why foreach is preferred to other loops like for or while loops - the main point is that you are letting the compiler decide the best way to construct the loop. If a future version of the compiler manages to use a faster technique, then you will get this for free by using foreach and rebuilding, rather than changing your code.
a foreach(item in list) construct allows you to use break or continue if you need to exit the iteration or the loop. But you cannot alter the list inside a foreach loop.
I'm surprised to see that list.ForEach is slightly faster. But that's probably not a valid reason to use it throughout , that would be premature optimisation. If your application uses a database or web service that, not loop control, is almost always going to be be where the time goes. And have you benchmarked it against a for loop too? The list.ForEach could be faster due to using that internally and a for loop without the wrapper would be even faster.
I disagree that the list.ForEach(delegate) version is "more functional" in any significant way. It does pass a function to a function, but there's no big difference in the outcome or program organisation.
I don't think that foreach(item in list) "says exactly how you want it done" - a for(int 1 = 0; i < count; i++) loop does that, a foreach loop leaves the choice of control up to the compiler.
My feeling is, on a new project, to use foreach(item in list) for most loops in order to adhere to the common usage and for readability, and use list.Foreach() only for short blocks, when you can do something more elegantly or compactly with the C# 3 "=>" operator. In cases like that, there may already be a LINQ extension method that is more specific than ForEach(). See if Where(), Select(), Any(), All(), Max() or one of the many other LINQ methods doesn't already do what you want from the loop.
As they say, the devil is in the details...
The biggest difference between the two methods of collection enumeration is that foreach carries state, whereas ForEach(x => { }) does not.
But lets dig a little deeper, because there are some things you should be aware of that can influence your decision, and there are some caveats you should be aware of when coding for either case.
Lets use List<T> in our little experiment to observe behavior. For this experiment, I am using .NET 4.7.2:
var names = new List<string>
{
"Henry",
"Shirley",
"Ann",
"Peter",
"Nancy"
};
Lets iterate over this with foreach first:
foreach (var name in names)
{
Console.WriteLine(name);
}
We could expand this into:
using (var enumerator = names.GetEnumerator())
{
}
With the enumerator in hand, looking under the covers we get:
public List<T>.Enumerator GetEnumerator()
{
return new List<T>.Enumerator(this);
}
internal Enumerator(List<T> list)
{
this.list = list;
this.index = 0;
this.version = list._version;
this.current = default (T);
}
public bool MoveNext()
{
List<T> list = this.list;
if (this.version != list._version || (uint) this.index >= (uint) list._size)
return this.MoveNextRare();
this.current = list._items[this.index];
++this.index;
return true;
}
object IEnumerator.Current
{
{
if (this.index == 0 || this.index == this.list._size + 1)
ThrowHelper.ThrowInvalidOperationException(ExceptionResource.InvalidOperation_EnumOpCantHappen);
return (object) this.Current;
}
}
Two things become immediate evident:
We are returned a stateful object with intimate knowledge of the underlying collection.
The copy of the collection is a shallow copy.
This is of course in no way thread safe. As was pointed out above, changing the collection while iterating is just bad mojo.
But what about the problem of the collection becoming invalid during iteration by means outside of us mucking with the collection during iteration? Best practices suggests versioning the collection during operations and iteration, and checking versions to detect when the underlying collection changes.
Here's where things get really murky. According to the Microsoft documentation:
If changes are made to the collection, such as adding, modifying, or
deleting elements, the behavior of the enumerator is undefined.
Well, what does that mean? By way of example, just because List<T> implements exception handling does not mean that all collections that implement IList<T> will do the same. That seems to be a clear violation of the Liskov Substitution Principle:
Objects of a superclass shall be replaceable with objects of its
subclasses without breaking the application.
Another problem is that the enumerator must implement IDisposable -- that means another source of potential memory leaks, not only if the caller gets it wrong, but if the author does not implement the Dispose pattern correctly.
Lastly, we have a lifetime issue... what happens if the iterator is valid, but the underlying collection is gone? We now a snapshot of what was... when you separate the lifetime of a collection and its iterators, you are asking for trouble.
Lets now examine ForEach(x => { }):
names.ForEach(name =>
{
});
This expands to:
public void ForEach(Action<T> action)
{
if (action == null)
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.match);
int version = this._version;
for (int index = 0; index < this._size && (version == this._version || !BinaryCompatibility.TargetsAtLeast_Desktop_V4_5); ++index)
action(this._items[index]);
if (version == this._version || !BinaryCompatibility.TargetsAtLeast_Desktop_V4_5)
return;
ThrowHelper.ThrowInvalidOperationException(ExceptionResource.InvalidOperation_EnumFailedVersion);
}
Of important note is the following:
for (int index = 0; index < this._size && ... ; ++index)
action(this._items[index]);
This code does not allocate any enumerators (nothing to Dispose), and does not pause while iterating.
Note that this also performs a shallow copy of the underlying collection, but the collection is now a snapshot in time. If the author does not correctly implement a check for the collection changing or going 'stale', the snapshot is still valid.
This doesn't in any way protect you from the problem of the lifetime issues... if the underlying collection disappears, you now have a shallow copy that points to what was... but at least you don't have a Dispose problem to deal with on orphaned iterators...
Yes, I said iterators... sometimes its advantageous to have state. Suppose you want to maintain something akin to a database cursor... maybe multiple foreach style Iterator<T>'s is the way to go. I personally dislike this style of design as there are too many lifetime issues, and you rely on the good graces of the authors of the collections you are relying on (unless you literally write everything yourself from scratch).
There is always a third option...
for (var i = 0; i < names.Count; i++)
{
Console.WriteLine(names[i]);
}
It ain't sexy, but its got teeth (apologies to Tom Cruise and the movie The Firm)
Its your choice, but now you know and it can be an informed one.
For fun, I popped List into reflector and this is the resulting C#:
public void ForEach(Action<T> action)
{
if (action == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.match);
}
for (int i = 0; i < this._size; i++)
{
action(this._items[i]);
}
}
Similarly, the MoveNext in Enumerator which is what is used by foreach is this:
public bool MoveNext()
{
if (this.version != this.list._version)
{
ThrowHelper.ThrowInvalidOperationException(ExceptionResource.InvalidOperation_EnumFailedVersion);
}
if (this.index < this.list._size)
{
this.current = this.list._items[this.index];
this.index++;
return true;
}
this.index = this.list._size + 1;
this.current = default(T);
return false;
}
The List.ForEach is much more trimmed down than MoveNext - far less processing - will more likely JIT into something efficient..
In addition, foreach() will allocate a new Enumerator no matter what. The GC is your friend, but if you're doing the same foreach repeatedly, this will make more throwaway objects, as opposed to reusing the same delegate - BUT - this is really a fringe case. In typical usage you will see little or no difference.
I guess the someList.ForEach() call could be easily parallelized whereas the normal foreach is not that easy to run parallel.
You could easily run several different delegates on different cores, which is not that easy to do with a normal foreach.
Just my 2 cents
I know two obscure-ish things that make them different. Go me!
Firstly, there's the classic bug of making a delegate for each item in the list. If you use the foreach keyword, all your delegates can end up referring to the last item of the list:
// A list of actions to execute later
List<Action> actions = new List<Action>();
// Numbers 0 to 9
List<int> numbers = Enumerable.Range(0, 10).ToList();
// Store an action that prints each number (WRONG!)
foreach (int number in numbers)
actions.Add(() => Console.WriteLine(number));
// Run the actions, we actually print 10 copies of "9"
foreach (Action action in actions)
action();
// So try again
actions.Clear();
// Store an action that prints each number (RIGHT!)
numbers.ForEach(number =>
actions.Add(() => Console.WriteLine(number)));
// Run the actions
foreach (Action action in actions)
action();
The List.ForEach method doesn't have this problem. The current item of the iteration is passed by value as an argument to the outer lambda, and then the inner lambda correctly captures that argument in its own closure. Problem solved.
(Sadly I believe ForEach is a member of List, rather than an extension method, though it's easy to define it yourself so you have this facility on any enumerable type.)
Secondly, the ForEach method approach has a limitation. If you are implementing IEnumerable by using yield return, you can't do a yield return inside the lambda. So looping through the items in a collection in order to yield return things is not possible by this method. You'll have to use the foreach keyword and work around the closure problem by manually making a copy of the current loop value inside the loop.
More here
You could name the anonymous delegate :-)
And you can write the second as:
someList.ForEach(s => s.ToUpper())
Which I prefer, and saves a lot of typing.
As Joachim says, parallelism is easier to apply to the second form.
List.ForEach() is considered to be more functional.
List.ForEach() says what you want done. foreach(item in list) also says exactly how you want it done. This leaves List.ForEach free to change the implementation of the how part in the future. For example, a hypothetical future version of .Net might always run List.ForEach in parallel, under the assumption that at this point everyone has a number of cpu cores that are generally sitting idle.
On the other hand, foreach (item in list) gives you a little more control over the loop. For example, you know that the items will be iterated in some kind of sequential order, and you could easily break in the middle if an item meets some condition.
Some more recent remarks on this issue are available here:
https://stackoverflow.com/a/529197/3043
The entire ForEach scope (delegate function) is treated as a single line of code (calling the function), and you cannot set breakpoints or step into the code. If an unhandled exception occurs the entire block is marked.
Behind the scenes, the anonymous delegate gets turned into an actual method so you could have some overhead with the second choice if the compiler didn't choose to inline the function. Additionally, any local variables referenced by the body of the anonymous delegate example would change in nature because of compiler tricks to hide the fact that it gets compiled to a new method. More info here on how C# does this magic:
http://blogs.msdn.com/oldnewthing/archive/2006/08/04/688527.aspx
The ForEach function is member of the generic class List.
I have created the following extension to reproduce the internal code:
public static class MyExtension<T>
{
public static void MyForEach(this IEnumerable<T> collection, Action<T> action)
{
foreach (T item in collection)
action.Invoke(item);
}
}
So a the end we are using a normal foreach (or a loop for if you want).
On the other hand, using a delegate function is just another way to define a function, this code:
delegate(string s) {
<process the string>
}
is equivalent to:
private static void myFunction(string s, <other variables...>)
{
<process the string>
}
or using labda expressions:
(s) => <process the string>
The second way you showed uses an extension method to execute the delegate method for each of the elements in the list.
This way, you have another delegate (=method) call.
Additionally, there is the possibility to iterate the list with a for loop.
One thing to be wary of is how to exit from the Generic .ForEach method - see this discussion. Although the link seems to say that this way is the fastest. Not sure why - you'd think they would be equivalent once compiled...

Probably BAD coding style ... please comment

I am checking whether the new name already exists or not.
Code 1
if(cmbxExistingGroups.Properties.Items.Cast<string>().ToList().Exists(txt => txt==txtNewGroup.Text.Trim())) {
MessageBox.Show("already exists.", "Add new group");
}
Otherwise I could have written:
Code 2
foreach(var str in cmbxExistingGroups.Properties.Items)
{
if(str==txtNewGroup.Text) {
MessageBox.Show("already exists.", "Add new group");
break;
}
}
I wrote these two and thought I was exploiting language features in code 1.
...and yes: both of them work for me ... I am wondering about the performance :-/
I appreciate the cleverness of the first sample (assuming it works), but the second one is a lot easier for the next person who has to maintain the code to figure out.
Sometimes just a little indentation makes a world of difference:
if (cmbxExistingGroups.Properties.Items
.Cast<string>().ToList()
.Exists
(
txt => txt==txtNewGroup.Text.Trim()
))
{
MessageBox.Show("already exists.", "Add new group");
}
Since your using a List<String>, you might as well just drop the Exists predicate and use Contains...use Exists when comparing complex objects by unique values.
I've quoted it before but I'll do it again:
Write your code as if the person maintaining it is a homicidal maniac
who knows where you live.
would
cmbxExistingGroups.Properties.Items.Contains(text)
not work instead?
There are a few things wrong here:
1) The two bits of code don't do the same thing - the first looks for the trimmed version of txtNewGroup, the second just looks for txtNewGroup
2) There's no point in calling ToList() - that just make things less efficient
3) Using Exists with a predicate is overkill - Contains is all you need here
So, the first could easily come down to:
if (cmbxExistingGroups.Properties.Items.Cast<string>.Contains(txtNewGroup.Text))
{
// Stuff
}
I'd probably create a variable to give "cmbxExistingGroups.Properties.Items.Cast" a meaningful, simple name - but then I'd say it's easier to understand than the explicit foreach loop.
The first code bit is fine, except instead of calling Enumerable.ToList() and List<T>.Exists(), you should just call Enumerable.Any() -- it does a lazy evaluation, so it never allocates the memory for the List<T>, and it will stop enumerating cmbxExistingGroups.Properties.Items and casting them to string. Also, calling the trim from inside that predicate means it happens for every item it looks at. It would be best to move it out to the outer scope:
string match = txtNewGroup.Text.Trim();
if(cmbxExistingGroups.Properties.Items.Cast<string>().Any(txt => txt==match)) {
MessageBox.Show("already exists.", "Add new group");
}
Verbosity in coding is not always bad at all. I prefer the second code snippet a lot over the first one. Just imagine you would have to maintain (or even change the functionality of) the first example... um.
Well, if it were me, it would be a variation on 2. Always prefer readability over one-liners. Additionally, always extract a method to make it clearer.
your calling code becomes
if( cmbxExistingGroups.ContainsKey(txtNewGroup.Text) )
{
MessageBox.Show("Already Exists");
}
If you define an extension method for Combo Boxes
public static class ComboBoxExtensions
{
public static bool ContainsKey(this ComboBox comboBox, string key)
{
foreach (string existing in comboBox.Items)
{
if (string.Equals(key, existing))
{
return true;
}
}
return false;
}
}
First, they're not equivalent. The 1st sample does a check against txtNewSGroup.Text.Trim(), the 2nd omits trim. Also, the 1st casts everything to a string, whereas the second uses whatever comes out of the iterator. I assume that's an object, or you wouldn't have needed the cast in the 1st place.
So, to be fair, the closest equivalent to the 2nd sample in the LINQ style would be:
if (mbxExistingGroups.Properties.Items.Cast<string>().Contains(txtNewGroup.Text)) {
...
}
which isn't too bad. But, since you seem to be working with old style IEnumerable instead of new fangled IEnumerable<T>, why don't we give you another extension method:
public static Contains<T>(this IEnumerable e, T value) {
return e.Cast<T>().Contains(value);
}
And now we have:
if (mbxExistingGroups.Properties.Items.Contains(txtNewGroup.Text)) {
...
}
which is pretty readable IMO.
I would agree, go with the second one because it will be easier to maintain for anybody else who works on it and when you come back to that in 6-12 months, it will be easier to remember what you were doing.
both of them works for me ..i am wonodering about the performance
I see no one read the question :) I think I see what you're doing (I don't use this language). The first tries to generate the list and test it in one shot. The second does an explicit iteration and can "short circuit" itself (exit early) if it finds the duplicate early on. The question is whether the "all at once" is more efficient due to the language implementation.
The second of the two would perform better, and it would perform the same as other people's samples that use Contains.
The reason why the first one uses an extra trim. plus a conversion to list. so it iterates once for conversion, then starts again to check using exists, and does a trim each time, but will exit iteration if found. The second starts iterating once, has no trim, and will exit if found.
So in short the answer to your question is the second performs much better.
From a performance point of view:
txtNewGroup.Text.Trim()
Do your control interaction/string manipulation outside of the loop - one time, instead of n times.
I imagine that on the WTF's per minute scale, the first would be off the chart. Count the dots, any more than two per line is a potential problem

Categories

Resources