Collection of strings to dictionary - c#

Given an ordered collection of strings:
var strings = new string[] { "abc", "def", "def", "ghi", "ghi", "ghi", "klm" };
Use LINQ to create a dictionary of string to number of occurrences of that string in the collection:
IDictionary<string,int> stringToNumOccurrences = ...;
Preferably do this in a single pass over the strings collection...

var dico = strings.GroupBy(x => x).ToDictionary(x => x.Key, x => x.Count());

Timwi/Darin's suggestion will perform this in a single pass over the original collection, but it will create multiple buffers for the groupings. LINQ isn't really very good at doing this kind of counting, and a problem like this was my original motiviation for writing Push LINQ. You might like to read my blog post on it for more details about why LINQ isn't terribly efficient here.
Push LINQ and the rather more impressive implementation of the same idea - Reactive Extensions - can handle this more efficiently.
Of course, if you don't really care too much about the extra efficiency, go with the GroupBy answer :)
EDIT: I hadn't noticed that your strings were ordered. That means you can be much more efficient, because you know that once you've seen string x and then string y, if x and y are different, you'll never see x again. There's nothing in LINQ to make this particularly easier, but you can do it yourself quite easily:
public static IDictionary<string, int> CountEntries(IEnumerable<string> strings)
{
var dictionary = new Dictionary<string, int>();
using (var iterator = strings.GetEnumerator())
{
if (!iterator.MoveNext())
{
// No entries
return dictionary;
}
string current = iterator.Current;
int currentCount = 1;
while (iterator.MoveNext())
{
string next = iterator.Current;
if (next == current)
{
currentCount++;
}
else
{
dictionary[current] = currentCount;
current = next;
currentCount = 1;
}
}
// Write out the trailing result
dictionary[current] = currentCount;
}
return dictionary;
}
This is O(n), with no dictionary lookups involved other than when writing the values. An alternative implementation would use foreach and a current value starting off at null... but that ends up being pretty icky in a couple of other ways. (I've tried it :) When I need special-case handling for the first value, I generally go with the above pattern.
Actually you could do this with LINQ using Aggregate, but it would be pretty nasty.

The standard LINQ way is this:
stringToNumOccurrences = strings.GroupBy(s => s)
.ToDictionary(g => g.Key, g => g.Count());

If this is actual production code, I'd go with Timwi's response.
If this is indeed homework and you're expected to write your own implementation, it shouldn't be too tough. Here are just a couple of hints to point you in the right direction:
Dictionary<TKey, TValue> has a ContainsKey method.
The IDictionary<TKey, TValue> interface's this[TKey] property is settable; i.e., you can do dictionary[key] = 1 (which means you can also do dictionary[key] += 1).
From those clues I think you should be able to figure out how to do it "by hand."

If you are looking for a particularly efficient (fast) solution, then GroupBy is probably too slow for you. You could use a loop:
var strings = new string[] { "abc", "def", "def", "ghi", "ghi", "ghi", "klm" };
var stringToNumOccurrences = new Dictionary<string, int>();
foreach (var str in strings)
{
if (stringToNumOccurrences.ContainsKey(str))
stringToNumOccurrences[str]++;
else
stringToNumOccurrences[str] = 1;
}
return stringToNumOccurrences;

This is a foreach version like the one that Jon mentions that he finds "pretty icky" in his answer. I'm putting it in here, so there's something concrete to talk about.
I must admit that I find it simpler than Jon's version and can't really see what's icky about it. Jon? Anyone?
static Dictionary<string, int> CountOrderedSequence(IEnumerable<string> source)
{
var result = new Dictionary<string, int>();
string prev = null;
int count = 0;
foreach (var s in source)
{
if (prev != s && count > 0)
{
result.Add(prev, count);
count = 0;
}
prev = s;
++count;
}
if (count > 0)
{
result.Add(prev, count);
}
return result;
}
Updated to add a necessary check for empty source - I still think it's simpler than Jon's :-)

Related

How to remove a scriptable object from a list of scriptable objects? [duplicate]

I am looking for a better pattern for working with a list of elements which each need processed and then depending on the outcome are removed from the list.
You can't use .Remove(element) inside a foreach (var element in X) (because it results in Collection was modified; enumeration operation may not execute. exception)... you also can't use for (int i = 0; i < elements.Count(); i++) and .RemoveAt(i) because it disrupts your current position in the collection relative to i.
Is there an elegant way to do this?
Iterate your list in reverse with a for loop:
for (int i = safePendingList.Count - 1; i >= 0; i--)
{
// some code
// safePendingList.RemoveAt(i);
}
Example:
var list = new List<int>(Enumerable.Range(1, 10));
for (int i = list.Count - 1; i >= 0; i--)
{
if (list[i] > 5)
list.RemoveAt(i);
}
list.ForEach(i => Console.WriteLine(i));
Alternately, you can use the RemoveAll method with a predicate to test against:
safePendingList.RemoveAll(item => item.Value == someValue);
Here's a simplified example to demonstrate:
var list = new List<int>(Enumerable.Range(1, 10));
Console.WriteLine("Before:");
list.ForEach(i => Console.WriteLine(i));
list.RemoveAll(i => i > 5);
Console.WriteLine("After:");
list.ForEach(i => Console.WriteLine(i));
foreach (var item in list.ToList()) {
list.Remove(item);
}
If you add ".ToList()" to your list (or the results of a LINQ query), you can remove "item" directly from "list" without the dreaded "Collection was modified; enumeration operation may not execute." error. The compiler makes a copy of "list", so that you can safely do the remove on the array.
While this pattern is not super efficient, it has a natural feel and is flexible enough for almost any situation. Such as when you want to save each "item" to a DB and remove it from the list only when the DB save succeeds.
A simple and straightforward solution:
Use a standard for-loop running backwards on your collection and RemoveAt(i) to remove elements.
Reverse iteration should be the first thing to come to mind when you want to remove elements from a Collection while iterating over it.
Luckily, there is a more elegant solution than writing a for loop which involves needless typing and can be error prone.
ICollection<int> test = new List<int>(new int[] {1, 2, 3, 4, 5, 6, 7, 8, 9, 10});
foreach (int myInt in test.Reverse<int>())
{
if (myInt % 2 == 0)
{
test.Remove(myInt);
}
}
Using the ToArray() on a generic list allows you to do a Remove(item) on your generic List:
List<String> strings = new List<string>() { "a", "b", "c", "d" };
foreach (string s in strings.ToArray())
{
if (s == "b")
strings.Remove(s);
}
Select the elements you do want rather than trying to remove the elements you don't want. This is so much easier (and generally more efficient too) than removing elements.
var newSequence = (from el in list
where el.Something || el.AnotherThing < 0
select el);
I wanted to post this as a comment in response to the comment left by Michael Dillon below, but it's too long and probably useful to have in my answer anyway:
Personally, I'd never remove items one-by-one, if you do need removal, then call RemoveAll which takes a predicate and only rearranges the internal array once, whereas Remove does an Array.Copy operation for every element you remove. RemoveAll is vastly more efficient.
And when you're backwards iterating over a list, you already have the index of the element you want to remove, so it would be far more efficient to call RemoveAt, because Remove first does a traversal of the list to find the index of the element you're trying to remove, but you already know that index.
So all in all, I don't see any reason to ever call Remove in a for-loop. And ideally, if it is at all possible, use the above code to stream elements from the list as needed so no second data structure has to be created at all.
Using .ToList() will make a copy of your list, as explained in this question:
ToList()-- Does it Create a New List?
By using ToList(), you can remove from your original list, because you're actually iterating over a copy.
foreach (var item in listTracked.ToList()) {
if (DetermineIfRequiresRemoval(item)) {
listTracked.Remove(item)
}
}
If the function that determines which items to delete has no side effects and doesn't mutate the item (it's a pure function), a simple and efficient (linear time) solution is:
list.RemoveAll(condition);
If there are side effects, I'd use something like:
var toRemove = new HashSet<T>();
foreach(var item in items)
{
...
if(condition)
toRemove.Add(item);
}
items.RemoveAll(toRemove.Contains);
This is still linear time, assuming the hash is good. But it has an increased memory use due to the hashset.
Finally if your list is only an IList<T> instead of a List<T> I suggest my answer to How can I do this special foreach iterator?. This will have linear runtime given typical implementations of IList<T>, compared with quadratic runtime of many other answers.
As any remove is taken on a condition you can use
list.RemoveAll(item => item.Value == someValue);
List<T> TheList = new List<T>();
TheList.FindAll(element => element.Satisfies(Condition)).ForEach(element => TheList.Remove(element));
You can't use foreach, but you could iterate forwards and manage your loop index variable when you remove an item, like so:
for (int i = 0; i < elements.Count; i++)
{
if (<condition>)
{
// Decrement the loop counter to iterate this index again, since later elements will get moved down during the remove operation.
elements.RemoveAt(i--);
}
}
Note that in general all of these techniques rely on the behaviour of the collection being iterated. The technique shown here will work with the standard List(T). (It is quite possible to write your own collection class and iterator that does allow item removal during a foreach loop.)
For loops are a bad construct for this.
Using while
var numbers = new List<int>(Enumerable.Range(1, 3));
while (numbers.Count > 0)
{
numbers.RemoveAt(0);
}
But, if you absolutely must use for
var numbers = new List<int>(Enumerable.Range(1, 3));
for (; numbers.Count > 0;)
{
numbers.RemoveAt(0);
}
Or, this:
public static class Extensions
{
public static IList<T> Remove<T>(
this IList<T> numbers,
Func<T, bool> predicate)
{
numbers.ForEachBackwards(predicate, (n, index) => numbers.RemoveAt(index));
return numbers;
}
public static void ForEachBackwards<T>(
this IList<T> numbers,
Func<T, bool> predicate,
Action<T, int> action)
{
for (var i = numbers.Count - 1; i >= 0; i--)
{
if (predicate(numbers[i]))
{
action(numbers[i], i);
}
}
}
}
Usage:
var numbers = new List<int>(Enumerable.Range(1, 10)).Remove((n) => n > 5);
However, LINQ already has RemoveAll() to do this
var numbers = new List<int>(Enumerable.Range(1, 10));
numbers.RemoveAll((n) => n > 5);
Lastly, you are probably better off using LINQ's Where() to filter and create a new list instead of mutating the existing list. Immutability is usually good.
var numbers = new List<int>(Enumerable.Range(1, 10))
.Where((n) => n <= 5)
.ToList();
Using Remove or RemoveAt on a list while iterating over that list has intentionally been made difficult, because it is almost always the wrong thing to do. You might be able to get it working with some clever trick, but it would be extremely slow. Every time you call Remove it has to scan through the entire list to find the element you want to remove. Every time you call RemoveAt it has to move subsequent elements 1 position to the left. As such, any solution using Remove or RemoveAt, would require quadratic time, O(n²).
Use RemoveAll if you can. Otherwise, the following pattern will filter the list in-place in linear time, O(n).
// Create a list to be filtered
IList<int> elements = new List<int>(new int[] {1, 2, 3, 4, 5, 6, 7, 8, 9, 10});
// Filter the list
int kept = 0;
for (int i = 0; i < elements.Count; i++) {
// Test whether this is an element that we want to keep.
if (elements[i] % 3 > 0) {
// Add it to the list of kept elements.
elements[kept] = elements[i];
kept++;
}
}
// Unfortunately IList has no Resize method. So instead we
// remove the last element of the list until: elements.Count == kept.
while (kept < elements.Count) elements.RemoveAt(elements.Count-1);
I would reassign the list from a LINQ query that filtered out the elements you didn't want to keep.
list = list.Where(item => ...).ToList();
Unless the list is very large there should be no significant performance problems in doing this.
The best way to remove items from a list while iterating over it is to use RemoveAll(). But the main concern written by people is that they have to do some complex things inside the loop and/or have complex compare cases.
The solution is to still use RemoveAll() but use this notation:
var list = new List<int>(Enumerable.Range(1, 10));
list.RemoveAll(item =>
{
// Do some complex operations here
// Or even some operations on the items
SomeFunction(item);
// In the end return true if the item is to be removed. False otherwise
return item > 5;
});
By assuming that predicate is a Boolean property of an element, that if it is true, then the element should be removed:
int i = 0;
while (i < list.Count())
{
if (list[i].predicate == true)
{
list.RemoveAt(i);
continue;
}
i++;
}
In C# one easy way is to mark the ones you wish to delete then create a new list to iterate over...
foreach(var item in list.ToList()){if(item.Delete) list.Remove(item);}
or even simpler use linq....
list.RemoveAll(p=>p.Delete);
but it is worth considering if other tasks or threads will have access to the same list at the same time you are busy removing, and maybe use a ConcurrentList instead.
I wish the "pattern" was something like this:
foreach( thing in thingpile )
{
if( /* condition#1 */ )
{
foreach.markfordeleting( thing );
}
elseif( /* condition#2 */ )
{
foreach.markforkeeping( thing );
}
}
foreachcompleted
{
// then the programmer's choices would be:
// delete everything that was marked for deleting
foreach.deletenow(thingpile);
// ...or... keep only things that were marked for keeping
foreach.keepnow(thingpile);
// ...or even... make a new list of the unmarked items
others = foreach.unmarked(thingpile);
}
This would align the code with the process that goes on in the programmer's brain.
foreach(var item in list.ToList())
{
if(item.Delete) list.Remove(item);
}
Simply create an entirely new list from the first one. I say "Easy" rather than "Right" as creating an entirely new list probably comes at a performance premium over the previous method (I haven't bothered with any benchmarking.) I generally prefer this pattern, it can also be useful in overcoming Linq-To-Entities limitations.
for(i = list.Count()-1;i>=0;i--)
{
item=list[i];
if (item.Delete) list.Remove(item);
}
This way cycles through the list backwards with a plain old For loop. Doing this forwards could be problematic if the size of the collection changes, but backwards should always be safe.
Just wanted to add my 2 cents to this in case this helps anyone, I had a similar problem but needed to remove multiple elements from an array list while it was being iterated over. the highest upvoted answer did it for me for the most part until I ran into errors and realized that the index was greater than the size of the array list in some instances because multiple elements were being removed but the index of the loop didn't keep track of that. I fixed this with a simple check:
ArrayList place_holder = new ArrayList();
place_holder.Add("1");
place_holder.Add("2");
place_holder.Add("3");
place_holder.Add("4");
for(int i = place_holder.Count-1; i>= 0; i--){
if(i>= place_holder.Count){
i = place_holder.Count-1;
}
// some method that removes multiple elements here
}
There is an option that hasn't been mentioned here.
If you don't mind adding a bit of code somewhere in your project, you can add and extension to List to return an instance of a class that does iterate through the list in reverse.
You would use it like this :
foreach (var elem in list.AsReverse())
{
//Do stuff with elem
//list.Remove(elem); //Delete it if you want
}
And here is what the extension looks like:
public static class ReverseListExtension
{
public static ReverseList<T> AsReverse<T>(this List<T> list) => new ReverseList<T>(list);
public class ReverseList<T> : IEnumerable
{
List<T> list;
public ReverseList(List<T> list){ this.list = list; }
public IEnumerator GetEnumerator()
{
for (int i = list.Count - 1; i >= 0; i--)
yield return list[i];
yield break;
}
}
}
This is basically list.Reverse() without the allocation.
Like some have mentioned you still get the drawback of deleting elements one by one, and if your list is massively long some of the options here are better. But I think there is a world where someone would want the simplicity of list.Reverse(), without the memory overhead.
Copy the list you are iterating. Then remove from the copy and interate the original. Going backwards is confusing and doesn't work well when looping in parallel.
var ids = new List<int> { 1, 2, 3, 4 };
var iterableIds = ids.ToList();
Parallel.ForEach(iterableIds, id =>
{
ids.Remove(id);
});
I would do like this
using System.IO;
using System;
using System.Collections.Generic;
class Author
{
public string Firstname;
public string Lastname;
public int no;
}
class Program
{
private static bool isEven(int i)
{
return ((i % 2) == 0);
}
static void Main()
{
var authorsList = new List<Author>()
{
new Author{ Firstname = "Bob", Lastname = "Smith", no = 2 },
new Author{ Firstname = "Fred", Lastname = "Jones", no = 3 },
new Author{ Firstname = "Brian", Lastname = "Brains", no = 4 },
new Author{ Firstname = "Billy", Lastname = "TheKid", no = 1 }
};
authorsList.RemoveAll(item => isEven(item.no));
foreach(var auth in authorsList)
{
Console.WriteLine(auth.Firstname + " " + auth.Lastname);
}
}
}
OUTPUT
Fred Jones
Billy TheKid
I found myself in a similar situation where I had to remove every nth element in a given List<T>.
for (int i = 0, j = 0, n = 3; i < list.Count; i++)
{
if ((j + 1) % n == 0) //Check current iteration is at the nth interval
{
list.RemoveAt(i);
j++; //This extra addition is necessary. Without it j will wrap
//down to zero, which will throw off our index.
}
j++; //This will always advance the j counter
}
The cost of removing an item from the list is proportional to the number of items following the one to be removed. In the case where the first half of the items qualify for removal, any approach which is based upon removing items individually will end up having to perform about N*N/4 item-copy operations, which can get very expensive if the list is large.
A faster approach is to scan through the list to find the first item to be removed (if any), and then from that point forward copy each item which should be retained to the spot where it belongs. Once this is done, if R items should be retained, the first R items in the list will be those R items, and all of the items requiring deletion will be at the end. If those items are deleted in reverse order, the system won't end up having to copy any of them, so if the list had N items of which R items, including all of the first F, were retained,
it will be necessary to copy R-F items, and shrink the list by one item N-R times. All linear time.
My approach is that I first create a list of indices, which should get deleted. Afterwards I loop over the indices and remove the items from the initial list. This looks like this:
var messageList = ...;
// Restrict your list to certain criteria
var customMessageList = messageList.FindAll(m => m.UserId == someId);
if (customMessageList != null && customMessageList.Count > 0)
{
// Create list with positions in origin list
List<int> positionList = new List<int>();
foreach (var message in customMessageList)
{
var position = messageList.FindIndex(m => m.MessageId == message.MessageId);
if (position != -1)
positionList.Add(position);
}
// To be able to remove the items in the origin list, we do it backwards
// so that the order of indices stays the same
positionList = positionList.OrderByDescending(p => p).ToList();
foreach (var position in positionList)
{
messageList.RemoveAt(position);
}
}
Trace the elements to be removed with a property, and remove them all after process.
using System.Linq;
List<MyProperty> _Group = new List<MyProperty>();
// ... add elements
bool cond = false;
foreach (MyProperty currObj in _Group)
{
// here it is supposed that you decide the "remove conditions"...
cond = true; // set true or false...
if (cond)
{
// SET - element can be deleted
currObj.REMOVE_ME = true;
}
}
// RESET
_Group.RemoveAll(r => r.REMOVE_ME);
myList.RemoveAt(i--);
simples;

What is the best way to trim a list?

I have a List of strings. Its being generated elsewhere but i will generate it below to help describe this simplified example
var list = new List<string>();
list.Add("Joe");
list.Add("");
list.Add("Bill");
list.Add("Bill");
list.Add("");
list.Add("Scott");
list.Add("Joe");
list.Add("");
list.Add("");
list = TrimList(list);
I would like a function that "trims" this list and by trim I want to remove all items at the end of the array that are blank strings (the final two in this case).
NOTE: I still want to keep the blank one that is the second item in the array (or any other one that is just not at the end) so I can't do a .Where(r=> String.isNullOrEmpty(r))
I would just write it without any LINQ, to be honest- after all, you're modifying a collection rather than just querying it:
void TrimList(List<string> list)
{
int lastNonEmpty = list.FindLastIndex(x => !string.IsNullOrEmpty(x));
int firstToRemove = lastNonEmpty + 1;
list.RemoveRange(firstToRemove, list.Count - firstToRemove);
}
If you actually want to create a new list, then the LINQ-based solutions are okay... although potentially somewhat inefficient (as Reverse has to buffer everything).
Take advantage of Reverse and SkipWhile.
list = list.Reverse().SkipWhile(s => String.IsNullOrEmpty(s)).Reverse().ToList();
List<T> (not the interface) has a FindLastIndex method. Therefore you can wrap that in a method:
static IList<string> TrimList(List<string> input) {
return input.Take(input.FindLastIndex(x => !string.IsNullOrEmpty(x)) + 1)
.ToList();
}
This produces a copy, whereas Jon's modifies the list.
The only solution I can think of is to code a loop that starts at the end of the list and searches for an element that is not an empty string. Don't know of any library functions that would help. Once you know the last good element, you know which ones to remove.
Be careful not to modify the collection while you are iterating over it. Tends to break the iterator.
I always like to come up with the most generic solution possible. Why restrict yourself with lists and strings? Let's make an algorithm for generic enumerable!
public static class EnumerableExtensions
{
public static IEnumerable<T> TrimEnd<T>(this IEnumerable<T> enumerable, Predicate<T> predicate)
{
if (predicate == null)
{
throw new ArgumentNullException("predicate");
}
var accumulator = new LinkedList<T>();
foreach (var item in enumerable)
{
if (predicate(item))
{
accumulator.AddLast(item);
}
else
{
foreach (var accumulated in accumulator)
{
yield return accumulated;
}
accumulator.Clear();
yield return item;
}
}
}
}
Use it like this:
var list = new[]
{
"Joe",
"",
"Bill",
"Bill",
"",
"Scott",
"Joe",
"",
""
};
foreach (var item in list.TrimEnd(string.IsNullOrEmpty))
{
Console.WriteLine(item);
}

Linq Query On IDictionaryEnumerator Possible?

I need to clear items from cache that contain a specific string in the key. I have started with the following and thought I might be able to do a linq query
var enumerator = HttpContext.Current.Cache.GetEnumerator();
But I can't? I was hoping to do something like
var enumerator = HttpContext.Current.Cache.GetEnumerator().Key.Contains("subcat");
Any ideas on how I could achieve this?
The Enumerator created by the Cache generates DictionaryEntry objects. Furthermore, a Cache may have only string keys.
Thus, you can write the following:
var httpCache = HttpContext.Current.Cache;
var toRemove = httpCache.Cast<DictionaryEntry>()
.Select(de=>(string)de.Key)
.Where(key=>key.Contains("subcat"))
.ToArray(); //use .ToArray() to avoid concurrent modification issues.
foreach(var keyToRemove in toRemove)
httpCache.Remove(keyToRemove);
However, this is a potentially slow operation when the cache is large: the cache is not designed to be used like this. You should ask yourself whether an alternative design isn't possible and preferable. Why do you need to remove several cache keys at once, and why aren't you grouping cache keys by substring?
Since Cache is an IEnumerable, you can freely apply all LINQ methods you need to it. The only thing you need is to cast it to IEnumerable<DictionaryEntry>:
var keysQuery = HttpContext.Current.Cache
.Cast<DictionaryEntry>()
.Select(entry => (string)entry.Key)
.Where(key => key.Contains("subcat"));
Now keysQuery is a non-strict collection of all keys starting from "subcat". But if you need to remove such entries from cache the simplest way is to just use foreach statement.
I don't think it is a great idea to walk the entire cache anyway, but you could do it non-LINQ with something like:
var iter = HttpContext.Current.Cache.GetEnumerator();
using (iter as IDisposable)
{
while (iter.MoveNext())
{
string s;
if ((s = iter.Key as string) != null && s.Contains("subcat"))
{
//... let the magic happen
}
}
}
to do it with LINQ you could do something like:
public static class Utils
{
public static IEnumerable<KeyValuePair<object, object>> ForLinq(this IDictionaryEnumerator iter)
{
using (iter as IDisposable)
{
while (iter.MoveNext()) yield return new KeyValuePair<object, object>(iter.Key, iter.Value);
}
}
}
and use like:
var items = HttpContext.Current.Cache.GetEnumerator().ForLinq()
.Where(pair => ((string)pair.Key).Contains("subcat"));

Best way to remove items from a collection

What is the best way to approach removing items from a collection in C#, once the item is known, but not it's index. This is one way to do it, but it seems inelegant at best.
//Remove the existing role assignment for the user.
int cnt = 0;
int assToDelete = 0;
foreach (SPRoleAssignment spAssignment in workspace.RoleAssignments)
{
if (spAssignment.Member.Name == shortName)
{
assToDelete = cnt;
}
cnt++;
}
workspace.RoleAssignments.Remove(assToDelete);
What I would really like to do is find the item to remove by property (in this case, name) without looping through the entire collection and using 2 additional variables.
If RoleAssignments is a List<T> you can use the following code.
workSpace.RoleAssignments.RemoveAll(x =>x.Member.Name == shortName);
If you want to access members of the collection by one of their properties, you might consider using a Dictionary<T> or KeyedCollection<T> instead. This way you don't have to search for the item you're looking for.
Otherwise, you could at least do this:
foreach (SPRoleAssignment spAssignment in workspace.RoleAssignments)
{
if (spAssignment.Member.Name == shortName)
{
workspace.RoleAssignments.Remove(spAssignment);
break;
}
}
#smaclell asked why reverse iteration was more efficient in in a comment to #sambo99.
Sometimes it's more efficient. Consider you have a list of people, and you want to remove or filter all customers with a credit rating < 1000;
We have the following data
"Bob" 999
"Mary" 999
"Ted" 1000
If we were to iterate forward, we'd soon get into trouble
for( int idx = 0; idx < list.Count ; idx++ )
{
if( list[idx].Rating < 1000 )
{
list.RemoveAt(idx); // whoops!
}
}
At idx = 0 we remove Bob, which then shifts all remaining elements left. The next time through the loop idx = 1, but
list[1] is now Ted instead of Mary. We end up skipping Mary by mistake. We could use a while loop, and we could introduce more variables.
Or, we just reverse iterate:
for (int idx = list.Count-1; idx >= 0; idx--)
{
if (list[idx].Rating < 1000)
{
list.RemoveAt(idx);
}
}
All the indexes to the left of the removed item stay the same, so you don't skip any items.
The same principle applies if you're given a list of indexes to remove from an array. In order to keep things straight you need to sort the list and then remove the items from highest index to lowest.
Now you can just use Linq and declare what you're doing in a straightforward manner.
list.RemoveAll(o => o.Rating < 1000);
For this case of removing a single item, it's no more efficient iterating forwards or backwards. You could also use Linq for this.
int removeIndex = list.FindIndex(o => o.Name == "Ted");
if( removeIndex != -1 )
{
list.RemoveAt(removeIndex);
}
If it's an ICollection then you won't have a RemoveAll method. Here's an extension method that will do it:
public static void RemoveAll<T>(this ICollection<T> source,
Func<T, bool> predicate)
{
if (source == null)
throw new ArgumentNullException("source", "source is null.");
if (predicate == null)
throw new ArgumentNullException("predicate", "predicate is null.");
source.Where(predicate).ToList().ForEach(e => source.Remove(e));
}
Based on:
http://phejndorf.wordpress.com/2011/03/09/a-removeall-extension-for-the-collection-class/
For a simple List structure the most efficient way seems to be using the Predicate RemoveAll implementation.
Eg.
workSpace.RoleAssignments.RemoveAll(x =>x.Member.Name == shortName);
The reasons are:
The Predicate/Linq RemoveAll method is implemented in List and has access to the internal array storing the actual data. It will shift the data and resize the internal array.
The RemoveAt method implementation is quite slow, and will copy the entire underlying array of data into a new array. This means reverse iteration is useless for List
If you are stuck implementing this in a the pre c# 3.0 era. You have 2 options.
The easily maintainable option. Copy all the matching items into a new list and and swap the underlying list.
Eg.
List<int> list2 = new List<int>() ;
foreach (int i in GetList())
{
if (!(i % 2 == 0))
{
list2.Add(i);
}
}
list2 = list2;
Or
The tricky slightly faster option, which involves shifting all the data in the list down when it does not match and then resizing the array.
If you are removing stuff really frequently from a list, perhaps another structure like a HashTable (.net 1.1) or a Dictionary (.net 2.0) or a HashSet (.net 3.5) are better suited for this purpose.
What type is the collection? If it's List, you can use the helpful "RemoveAll":
int cnt = workspace.RoleAssignments
.RemoveAll(spa => spa.Member.Name == shortName)
(This works in .NET 2.0. Of course, if you don't have the newer compiler, you'll have to use "delegate (SPRoleAssignment spa) { return spa.Member.Name == shortName; }" instead of the nice lambda syntax.)
Another approach if it's not a List, but still an ICollection:
var toRemove = workspace.RoleAssignments
.FirstOrDefault(spa => spa.Member.Name == shortName)
if (toRemove != null) workspace.RoleAssignments.Remove(toRemove);
This requires the Enumerable extension methods. (You can copy the Mono ones in, if you are stuck on .NET 2.0). If it's some custom collection that cannot take an item, but MUST take an index, some of the other Enumerable methods, such as Select, pass in the integer index for you.
This is my generic solution
public static IEnumerable<T> Remove<T>(this IEnumerable<T> items, Func<T, bool> match)
{
var list = items.ToList();
for (int idx = 0; idx < list.Count(); idx++)
{
if (match(list[idx]))
{
list.RemoveAt(idx);
idx--; // the list is 1 item shorter
}
}
return list.AsEnumerable();
}
It would look much simpler if extension methods support passing by reference !
usage:
var result = string[]{"mike", "john", "ali"}
result = result.Remove(x => x.Username == "mike").ToArray();
Assert.IsTrue(result.Length == 2);
EDIT: ensured that the list looping remains valid even when deleting items by decrementing the index (idx).
Here is a pretty good way to do it
http://support.microsoft.com/kb/555972
System.Collections.ArrayList arr = new System.Collections.ArrayList();
arr.Add("1");
arr.Add("2");
arr.Add("3");
/*This throws an exception
foreach (string s in arr)
{
arr.Remove(s);
}
*/
//where as this works correctly
Console.WriteLine(arr.Count);
foreach (string s in new System.Collections.ArrayList(arr))
{
arr.Remove(s);
}
Console.WriteLine(arr.Count);
Console.ReadKey();
There is another approach you can take depending on how you're using your collection. If you're downloading the assignments one time (e.g., when the app runs), you could translate the collection on the fly into a hashtable where:
shortname => SPRoleAssignment
If you do this, then when you want to remove an item by short name, all you need to do is remove the item from the hashtable by key.
Unfortunately, if you're loading these SPRoleAssignments a lot, that obviously isn't going to be any more cost efficient in terms of time. The suggestions other people made about using Linq would be good if you're using a new version of the .NET Framework, but otherwise, you'll have to stick to the method you're using.
Similar to Dictionary Collection point of view, I have done this.
Dictionary<string, bool> sourceDict = new Dictionary<string, bool>();
sourceDict.Add("Sai", true);
sourceDict.Add("Sri", false);
sourceDict.Add("SaiSri", true);
sourceDict.Add("SaiSriMahi", true);
var itemsToDelete = sourceDict.Where(DictItem => DictItem.Value == false);
foreach (var item in itemsToDelete)
{
sourceDict.Remove(item.Key);
}
Note:
Above code will fail in .Net Client Profile (3.5 and 4.5) also some viewers mentioned it is
Failing for them in .Net4.0 as well not sure which settings are causing the problem.
So replace with below code (.ToList()) for Where statement, to avoid that error. “Collection was modified; enumeration operation may not execute.”
var itemsToDelete = sourceDict.Where(DictItem => DictItem.Value == false).ToList();
Per MSDN From .Net4.5 onwards Client Profile are discontinued. http://msdn.microsoft.com/en-us/library/cc656912(v=vs.110).aspx
Save your items first, than delete them.
var itemsToDelete = Items.Where(x => !!!your condition!!!).ToArray();
for (int i = 0; i < itemsToDelete.Length; ++i)
Items.Remove(itemsToDelete[i]);
You need to override GetHashCode() in your Item class.
The best way to do it is by using linq.
Example class:
public class Product
{
public string Name { get; set; }
public string Price { get; set; }
}
Linq query:
var subCollection = collection1.RemoveAll(w => collection2.Any(q => q.Name == w.Name));
This query will remove all elements from collection1 if Name match any element Name from collection2
Remember to use: using System.Linq;
To do this while looping through the collection and not to get the modifying a collection exception, this is the approach I've taken in the past (note the .ToList() at the end of the original collection, this creates another collection in memory, then you can modify the existing collection)
foreach (SPRoleAssignment spAssignment in workspace.RoleAssignments.ToList())
{
if (spAssignment.Member.Name == shortName)
{
workspace.RoleAssignments.Remove(spAssignment);
}
}
If you have got a List<T>, then List<T>.RemoveAll is your best bet. There can't be anything more efficient. Internally it does the array moving in one shot, not to mention it is O(N).
If all you got is an IList<T> or an ICollection<T> you got roughly these three options:
public static void RemoveAll<T>(this IList<T> ilist, Predicate<T> predicate) // O(N^2)
{
for (var index = ilist.Count - 1; index >= 0; index--)
{
var item = ilist[index];
if (predicate(item))
{
ilist.RemoveAt(index);
}
}
}
or
public static void RemoveAll<T>(this ICollection<T> icollection, Predicate<T> predicate) // O(N)
{
var nonMatchingItems = new List<T>();
// Move all the items that do not match to another collection.
foreach (var item in icollection)
{
if (!predicate(item))
{
nonMatchingItems.Add(item);
}
}
// Clear the collection and then copy back the non-matched items.
icollection.Clear();
foreach (var item in nonMatchingItems)
{
icollection.Add(item);
}
}
or
public static void RemoveAll<T>(this ICollection<T> icollection, Func<T, bool> predicate) // O(N^2)
{
foreach (var item in icollection.Where(predicate).ToList())
{
icollection.Remove(item);
}
}
Go for either 1 or 2.
1 is lighter on memory and faster if you have less deletes to perform (i.e. predicate is false most of the times).
2 is faster if you have more deletes to perform.
3 is the cleanest code but performs poorly IMO. Again all that depends on input data.
For some benchmarking details see https://github.com/dotnet/BenchmarkDotNet/issues/1505
A lot of good responses here; I especially like the lambda expressions...very clean. I was remiss, however, in not specifying the type of Collection. This is a SPRoleAssignmentCollection (from MOSS) that only has Remove(int) and Remove(SPPrincipal), not the handy RemoveAll(). So, I have settled on this, unless there is a better suggestion.
foreach (SPRoleAssignment spAssignment in workspace.RoleAssignments)
{
if (spAssignment.Member.Name != shortName) continue;
workspace.RoleAssignments.Remove((SPPrincipal)spAssignment.Member);
break;
}

Is there a built-in method to compare collections?

I would like to compare the contents of a couple of collections in my Equals method. I have a Dictionary and an IList. Is there a built-in method to do this?
Edited:
I want to compare two Dictionaries and two ILists, so I think what equality means is clear - if the two dictionaries contain the same keys mapped to the same values, then they're equal.
Enumerable.SequenceEqual
Determines whether two sequences are equal by comparing their elements by using a specified IEqualityComparer(T).
You can't directly compare the list & the dictionary, but you could compare the list of values from the Dictionary with the list
As others have suggested and have noted, SequenceEqual is order-sensitive. To solve that, you can sort the dictionary by key (which is unique, and thus the sort is always stable) and then use SequenceEqual. The following expression checks if two dictionaries are equal regardless of their internal order:
dictionary1.OrderBy(kvp => kvp.Key).SequenceEqual(dictionary2.OrderBy(kvp => kvp.Key))
EDIT: As pointed out by Jeppe Stig Nielsen, some object have an IComparer<T> that is incompatible with their IEqualityComparer<T>, yielding incorrect results. When using keys with such an object, you must specify a correct IComparer<T> for those keys. For example, with string keys (which exhibit this issue), you must do the following in order to get correct results:
dictionary1.OrderBy(kvp => kvp.Key, StringComparer.Ordinal).SequenceEqual(dictionary2.OrderBy(kvp => kvp.Key, StringComparer.Ordinal))
In addition to the mentioned SequenceEqual, which
is true if two lists are of equal length and their corresponding
elements compare equal according to a comparer
(which may be the default comparer, i.e. an overriden Equals())
it is worth mentioning that in .Net4 there is SetEquals on ISet objects,
which
ignores the order of elements and any duplicate elements.
So if you want to have a list of objects, but they don't need to be in a specific order, consider that an ISet (like a HashSet) may be the right choice.
Take a look at the Enumerable.SequenceEqual method
var dictionary = new Dictionary<int, string>() {{1, "a"}, {2, "b"}};
var intList = new List<int> {1, 2};
var stringList = new List<string> {"a", "b"};
var test1 = dictionary.Keys.SequenceEqual(intList);
var test2 = dictionary.Values.SequenceEqual(stringList);
This is not directly answering your questions, but both the MS' TestTools and NUnit provide
CollectionAssert.AreEquivalent
which does pretty much what you want.
I didn't know about Enumerable.SequenceEqual method (you learn something every day....), but I was going to suggest using an extension method; something like this:
public static bool IsEqual(this List<int> InternalList, List<int> ExternalList)
{
if (InternalList.Count != ExternalList.Count)
{
return false;
}
else
{
for (int i = 0; i < InternalList.Count; i++)
{
if (InternalList[i] != ExternalList[i])
return false;
}
}
return true;
}
Interestingly enough, after taking 2 seconds to read about SequenceEqual, it looks like Microsoft has built the function I described for you.
.NET Lacks any powerful tools for comparing collections. I've developed a simple solution you can find at the link below:
http://robertbouillon.com/2010/04/29/comparing-collections-in-net/
This will perform an equality comparison regardless of order:
var list1 = new[] { "Bill", "Bob", "Sally" };
var list2 = new[] { "Bob", "Bill", "Sally" };
bool isequal = list1.Compare(list2).IsSame;
This will check to see if items were added / removed:
var list1 = new[] { "Billy", "Bob" };
var list2 = new[] { "Bob", "Sally" };
var diff = list1.Compare(list2);
var onlyinlist1 = diff.Removed; //Billy
var onlyinlist2 = diff.Added; //Sally
var inbothlists = diff.Equal; //Bob
This will see what items in the dictionary changed:
var original = new Dictionary<int, string>() { { 1, "a" }, { 2, "b" } };
var changed = new Dictionary<int, string>() { { 1, "aaa" }, { 2, "b" } };
var diff = original.Compare(changed, (x, y) => x.Value == y.Value, (x, y) => x.Value == y.Value);
foreach (var item in diff.Different)
Console.Write("{0} changed to {1}", item.Key.Value, item.Value.Value);
//Will output: a changed to aaa
To compare collections you can also use LINQ. Enumerable.Intersect returns all pairs that are equal. You can comparse two dictionaries like this:
(dict1.Count == dict2.Count) && dict1.Intersect(dict2).Count() == dict1.Count
The first comparison is needed because dict2 can contain all the keys from dict1 and more.
You can also use think of variations using Enumerable.Except and Enumerable.Union that lead to similar results. But can be used to determine the exact differences between sets.
How about this example:
static void Main()
{
// Create a dictionary and add several elements to it.
var dict = new Dictionary<string, int>();
dict.Add("cat", 2);
dict.Add("dog", 3);
dict.Add("x", 4);
// Create another dictionary.
var dict2 = new Dictionary<string, int>();
dict2.Add("cat", 2);
dict2.Add("dog", 3);
dict2.Add("x", 4);
// Test for equality.
bool equal = false;
if (dict.Count == dict2.Count) // Require equal count.
{
equal = true;
foreach (var pair in dict)
{
int value;
if (dict2.TryGetValue(pair.Key, out value))
{
// Require value be equal.
if (value != pair.Value)
{
equal = false;
break;
}
}
else
{
// Require key be present.
equal = false;
break;
}
}
}
Console.WriteLine(equal);
}
Courtesy : https://www.dotnetperls.com/dictionary-equals
For ordered collections (List, Array) use SequenceEqual
for HashSet use SetEquals
for Dictionary you can do:
namespace System.Collections.Generic {
public static class ExtensionMethods {
public static bool DictionaryEquals<TKey, TValue>(this IReadOnlyDictionary<TKey, TValue> d1, IReadOnlyDictionary<TKey, TValue> d2) {
if (object.ReferenceEquals(d1, d2)) return true;
if (d2 is null || d1.Count != d2.Count) return false;
foreach (var (d1key, d1value) in d1) {
if (!d2.TryGetValue(d1key, out TValue d2value)) return false;
if (!d1value.Equals(d2value)) return false;
}
return true;
}
}
}
(A more optimized solution will use sorting but that will require IComparable<TValue>)
No, because the framework doesn't know how to compare the contents of your lists.
Have a look at this:
http://blogs.msdn.com/abhinaba/archive/2005/10/11/479537.aspx
public bool CompareStringLists(List<string> list1, List<string> list2)
{
if (list1.Count != list2.Count) return false;
foreach(string item in list1)
{
if (!list2.Contains(item)) return false;
}
return true;
}
There wasn't, isn't and might not be, at least I would believe so. The reason behind is collection equality is probably an user defined behavior.
Elements in collections are not supposed to be in a particular order though they do have an ordering naturally, it's not what the comparing algorithms should rely on. Say you have two collections of:
{1, 2, 3, 4}
{4, 3, 2, 1}
Are they equal or not? You must know but I don't know what's your point of view.
Collections are conceptually unordered by default, until the algorithms provide the sorting rules. The same thing SQL server will bring to your attention is when you trying to do pagination, it requires you to provide sorting rules:
https://learn.microsoft.com/en-US/sql/t-sql/queries/select-order-by-clause-transact-sql?view=sql-server-2017
Yet another two collections:
{1, 2, 3, 4}
{1, 1, 1, 2, 2, 3, 4}
Again, are they equal or not? You tell me ..
Element repeatability of a collection plays its role in different scenarios and some collections like Dictionary<TKey, TValue> don't even allow repeated elements.
I believe these kinds of equality are application defined and the framework therefore did not provide all of the possible implementations.
Well, in general cases Enumerable.SequenceEqual is good enough but it returns false in the following case:
var a = new Dictionary<String, int> { { "2", 2 }, { "1", 1 }, };
var b = new Dictionary<String, int> { { "1", 1 }, { "2", 2 }, };
Debug.Print("{0}", a.SequenceEqual(b)); // false
I read some answers to questions like this(you may google for them) and what I would use, in general:
public static class CollectionExtensions {
public static bool Represents<T>(this IEnumerable<T> first, IEnumerable<T> second) {
if(object.ReferenceEquals(first, second)) {
return true;
}
if(first is IOrderedEnumerable<T> && second is IOrderedEnumerable<T>) {
return Enumerable.SequenceEqual(first, second);
}
if(first is ICollection<T> && second is ICollection<T>) {
if(first.Count()!=second.Count()) {
return false;
}
}
first=first.OrderBy(x => x.GetHashCode());
second=second.OrderBy(x => x.GetHashCode());
return CollectionExtensions.Represents(first, second);
}
}
That means one collection represents the other in their elements including repeated times without taking the original ordering into account. Some notes of the implementation:
GetHashCode() is just for the ordering not for equality; I think it's enough in this case
Count() will not really enumerates the collection and directly fall into the property implementation of ICollection<T>.Count
If the references are equal, it's just Boris
I've made my own compare method. It returns common, missing, and extra values.
private static void Compare<T>(IEnumerable<T> actual, IEnumerable<T> expected, out IList<T> common, out IList<T> missing, out IList<T> extra) {
common = new List<T>();
missing = new List<T>();
extra = new List<T>();
var expected_ = new LinkedList<T>( expected );
foreach (var item in actual) {
if (expected_.Remove( item )) {
common.Add( item );
} else {
extra.Add( item );
}
}
foreach (var item in expected_) {
missing.Add( item );
}
}
Comparing dictionaries' contents:
To compare two Dictionary<K, V> objects, we can assume that the keys are unique for every value, thus if two sets of keys are equal, then the two dictionaries' contents are equal.
Dictionary<K, V> dictionaryA, dictionaryB;
bool areDictionaryContentsEqual = new HashSet<K>(dictionaryA.Keys).SetEquals(dictionaryB.Keys);
Comparing collections' contents:
To compare two ICollection<T> objects, we need to check:
If they are of the same length.
If every T value that appears in the first collection appears an equal number of times in the second.
public static bool AreCollectionContentsEqual<T>(ICollection<T> collectionA, ICollection<T> collectionB)
where T : notnull
{
if (collectionA.Count != collectionB.Count)
{
return false;
}
Dictionary<T, int> countByValueDictionary = new(collectionA.Count);
foreach(T item in collectionA)
{
countByValueDictionary[item] = countByValueDictionary.TryGetValue(item, out int count)
? count + 1
: 1;
}
foreach (T item in collectionB)
{
if (!countByValueDictionary.TryGetValue(item, out int count) || count < 1)
{
return false;
}
countByValueDictionary[item] = count - 1;
}
return true;
}
These solutions should be optimal since their time and memory complexities are O(n), while the solutions that use ordering/sorting have time and memory complexities greater than O(n).

Categories

Resources