Dictionary of lists to (flattened) List

Dictionary of lists to (flattened) List - c#

I have a dictionary of type <MyKey, List<MyClass>> and a List of type MyClass. Now I want to check if there are elements within the latter that are not contained in any of the lists within the dictionary. My first approach was on nesting two loops (one for the actual list, one for the lists within the dictionary). If an item was found I may break the inner loop and continue with next element in outer loop.
foreach (MyClass feature in features)
{
bool found = false;
foreach (var kv in this._features) // this._features is the dictionary
{
if (kv.Value.Contains(feature))
{
found = true;
continue;
}
}
if (!found) result.Add(feature);
}
This works so far, but I´d prefer a shorter approach for this, probably using LINQ. I think it may work if I flatten the values of the dictionary into one single list, but I have no clue on how to achieve this.

Use SelectMany to flatten your values into a IEnumerable<MyClass> then use Except to get differences:
var differentElements = this._features.SelectMany(x => x.Value).Except(features);
result.AddRange(differentElements);
This might not work as expected if MyClass doesn't override Equals and GetHashCode properly.

Related

How can I write list.GroupBy(x => x.AccountNumber).Select(g => g.First()) without LINQ

I need to analyze a task that starts with the code below but I couldn't figure out what the LINQ part is doing. Any leads are appreciated
foreach (var item in list.GroupBy(x => x.AccountNumber).Select(g => g.First()))
{
...
}

Some roughly-equivalent code (i.e. has the same function, but works slightly differently) would be:
var seenAccountNumbers = new HashSet<int>(); // Or some other data type?
foreach (var item in list)
{
if (seenAccountNumbers.Add(item.AccountNumber))
{
...
}
}

This code is a (somewhat wasteful) way of getting the first item by account number. It's wasteful because there's no reason to group everything before trying to find the first item per group.
The same thing can be implemented with an iterator function by iterating over all items in the input list and keeping track of all the AccountNumber values found so far. When a new one is found, yield it and add it to the tracking list. Or rather, HashSet.
In fact, that's how MoreLinq's DistinctBy operator is implemented :
var knownKeys = new HashSet<TKey>(comparer);
foreach (var element in source)
{
if (knownKeys.Add(keySelector(element)))
yield return element;
}
From the method's description:
Returns all distinct elements of the given source, where "distinctness"is determined via a projection and the default equality comparer for the projected type.
If a key is seen multiple times, only the first element with that key is returned.
The question's code can be replaced with :
foreach (var item in list.DistinctBy(x => x.AccountNumber))
{...
}

Create a dictionary, with the AccountNumber as Key, and put all your items from list, in that dictionary. That is about what happens.
You will overwrite items, with the same key, and a randomly last element, will stay in the dictionary. There is no order ensured when using GroupBy, so it doesn't matter if you choose First or Last element at the end, it just has the meaning of "pick one" (random).
var dict = new Dictionary<KeyType, ElementType>();
foreach(var item in list)
if (!dict.ContainsKey(item.AccountNumber))
dict[item.AccountNumber] = item;
You original iteration would now be
foreach(var item in dict.Values)
{
.....
}
To ask for Non-LINQ solution is not so strange, cause LINQ offers never the most performant solution, it's just short writing and fast coding.

Performance between check exists before add to list and distinct in linq

In the foreach loop, I want to add the Products to a List, but I want this List to not contain duplicate Products, currently I have two ideas solved.
1/ In the loop, before adding the Product to the List, I will check whether the Product already exists in the List, otherwise I will add it to the List.
foreach (var product in products)
{
// code logic
if(!listProduct.Any(x => x.Id == product.Id))
{
listProduct.Add(product);
}
}
2/. In the loop, I will add all the Products to the List even if there are duplicate products. Then outside of the loop, I would use Distinct to remove duplicate records.
foreach (var product in products)
{
// code logic
listProduct.Add(product);
}
listProduct = listProduct.Distinct().ToList();
I wonder in these two ways is the most effective way. Or have any other ideas to be able to add records to the List to avoid duplication ??

I'd go for a third approach: the HashSet. It has a constructor overload that accepts an IEnumerable. This constructor removes duplicates:
If the input collection contains duplicates, the set will contain one
of each unique element. No exception will be thrown.
Source: HashSet<T> Constructor
usage:
List<Product> myProducts = ...;
var setOfProducts = new HashSet<Product>(myProducts);
After removing duplicates there is no proper meaning of setOfProducts[4].
Therefore a HashSet is not a IList<Product>, but an ICollection<Product>, you can Count / Add / Remove, etc, everything you can do with a List. The only thing you can't do is fetch by index

You first take which elements are not already in the collection:
var newProducts = products.Where(x => !listProduct.Any(y => x.Id == y.Id));
And then just add them using AddRang
listProduct.AddRagne(newItems)
Or you can use foreach loop too
foreach (var product in newProducts)
{
listProduct.Add(product);
}
1 more easy solution could be there no need to use Distint
var newProductList = products.Union(listProduct).ToList();
But Union has not good performance.

From what you have included, you are storing everything in memory. If this is the case, or you are persisting only after you have it ready you can consider using BinarySearch:
https://msdn.microsoft.com/en-us/library/w4e7fxsh(v=vs.110).aspx and you also get an ordered list at the end. If ordering is not important, you can use HashSet, which is very fast, and meant specially for this purpose.
Check also: https://www.dotnetperls.com/hashset

This should be pretty fast and take care of any ordering:
// build a HashSet of your primary keys type (I'm assuming integers here) containing all your list elements' keys
var hashSet = new HashSet<int>(listProduct.Select(p => p.Id));
// add all items from the products list whose Id can be added to the hashSet (so it's not a duplicate)
listProduct.AddRange(products.Where(p => hashSet.Add(p.Id)));
What you might want to consider doing instead, though, is implementing IEquatable<Product> and overriding GetHashCode() on your Product type which would make the above code a little easier and put the equality checks where they should be (inside the respective type):
var hashSet = new HashSet<int>(listProduct);
listProduct.AddRange(products.Where(hashSet.Add));

SortedHashTable in c#

What I am trying to do is to implement a heuristic approach to NP complete problem: I have a list of objects (matches) each has a double score. I am taking the first element in the list sorted by the score desc and then remove it from the list. Then all elements bound to the first one are to be removed. I iterate through the list till I have no more elements.
I need a data structure which can efficiently solve this problem, so basically it should ahve the following properties:
1. Generic
2. Is always sorted
3. Has a fast key access
Right now SortedSet<T> looks like the best fit.
The question is: is it the most optimal choice for in my case?
List result = new List();
while (sortedItems.Any())
{
var first = sortedItems.First();
result.Add(first);
sortedItems.Remove(first);
foreach (var dependentFirst in first.DependentElements)
{
sortedItems.Remove(dependentFirst);
}
}
What I need is something like sorted hash table.

I assume you're not just wanting to clear the list, but you want to do something with each item as it's removed.
var toDelete = new HashSet<T>();
foreach (var item in sortedItems)
{
if (!toDelete.Contains(item))
{
toDelete.Add(item);
// do something with item here
}
foreach (var dependentFirst in item.DependentElements)
{
if (!toDelete.Contains(item))
{
toDelete.Add(dependentFirst);
// do something with item here
}
}
}
sortedItems.RemoveAll(i => toDelete.Contains(i));

I think you should use two data structures - a heap and a set - heap for keeping the sorted items, set for keeping the removed items. Fill the heap with the items, then remove the top one, and add it and all its dependents to the set. Remove the second one - if it's already in the set, ignore it and move to the third, otherwise add it and its dependents to the set.
Each time you add an item to the set, also do whatever it is you plan to do with the items.
The complexity here is O(NlogN), you won't get any better than this, as you have to sort the list of items anyway. If you want to get better performance, you can add a 'Removed' boolean to each item, and set it to true instead of using a set to keep track of the removed items. I don't know if this is applicable to you.

If im not mistake, you want something like this
var dictionary = new Dictionary<string, int>();
dictionary.Add("car", 2);
dictionary.Add("apple", 1);
dictionary.Add("zebra", 0);
dictionary.Add("mouse", 5);
dictionary.Add("year", 3);
dictionary = dictionary.OrderBy(o => o.Key).ToDictionary(o => o.Key, o => o.Value);

Is there a LINQ method to join/concat an unknown number of lists?

I have an object that contains a list of child objects, each of which in turn contains a list of children, and so on. Using that first generation of children only, I want to combine all those lists as cleanly and cheaply as possible. I know I can do something like
public List<T> UnifiedListOfTChildren<T>()
{
List<T> newlist = new List<T>();
foreach (childThing in myChildren)
{
newlist = newlist.Concat<T>(childThing.TChildren);
}
return newlist;
}
but is there a more elegant, less expensive LINQ method I'm missing?
EDIT If you've landed at this question the same way I did and are new to SelectMany, I strongly recommend this visual explanation of how to use it. Comes up near the top in google results currently, but is worth skipping straight to.

var newList = myChildren.SelectMany(c => c.TChildren);

Efficiently deleting item from within 'foreach'

For now, the best I could think of is:
bool oneMoreTime = true;
while (oneMoreTime)
{
ItemType toDelete=null;
oneMoreTime=false;
foreach (ItemType item in collection)
{
if (ShouldBeDeleted(item))
{
toDelete=item;
break;
}
}
if (toDelete!=null)
{
collection.Remove(toDelete);
oneMoreTime=true;
}
}
I know that I have at least one extra variable here, but I included it to improve the readability of the algorithm.

The "RemoveAll" method is best.
Another common technique is:
var itemsToBeDeleted = collection.Where(i=>ShouldBeDeleted(i)).ToList();
foreach(var itemToBeDeleted in itemsToBeDeleted)
collection.Remove(itemToBeDeleted);
Another common technique is to use a "for" loop, but make sure you go backwards:
for (int i = collection.Count - 1; i >= 0; --i)
if (ShouldBeDeleted(collection[i]))
collection.RemoveAt(i);
Another common technique is to add the items that are not being removed to a new collection:
var newCollection = new List<whatever>();
foreach(var item in collection.Where(i=>!ShouldBeDeleted(i))
newCollection.Add(item);
And now you have two collections. A technique I particularly like if you want to end up with two collections is to use immutable data structures. With an immutable data structure, "removing" an item does not change the data structure; it gives you back a new data structure (that re-uses bits from the old one, if possible) that does not have the item you removed. With immutable data structures you are not modifying the thing you're iterating over, so there's no problem:
var newCollection = oldCollection;
foreach(var item in oldCollection.Where(i=>ShouldBeDeleted(i))
newCollection = newCollection.Remove(item);
or
var newCollection = ImmutableCollection<whatever>.Empty;
foreach(var item in oldCollection.Where(i=>!ShouldBeDeleted(i))
newCollection = newCollection.Add(item);
And when you're done, you have two collections. The new one has the items removed, the old one is the same as it ever was.

Just as I finished typing I remembered that there is lambda-way to do it.
collection.RemoveAll(i=>ShouldBeDeleted(i));
Better way?

A forward variation on the backward for loop:
for (int i = 0; i < collection.Count; )
if (ShouldBeDeleted(collection[i]))
collection.RemoveAt(i)
else
i++;

You cannot delete from a collection inside a foreach loop (unless it is a very special collection having a special enumerator). The BCL collections will throw exceptions if the collection is modified while it is being enumerated.
You could use a for loop to delete individual elements and adjust the index accordingly. However, doing that can be error prone. Depending on the implementation of the underlying collection it may also be expensive to delete individual elements. For instance deleting the first element of a List<T> will copy all the remaning elements in the list.
The best solution is often to create a new collection based on the old:
var newCollection = collection.Where(item => !ShouldBeDeleted(item)).ToList();
Use ToList() or ToArray() to create the new collection or initialize your specific collection type from the IEnumerable returned by the Where() clause.

The lambda way is good. You could also use a regular for loop, you can iterate lists that a for loop uses within the loop itself, unlike a foreach loop.
for (int i = collection.Count-1; i >= 0; i--)
{
if(ShouldBeDeleted(collection[i])
collection.RemoveAt(i);
}
I am assuming that collection is an arraylist here, the code might be a bit different if you are using a different data structure.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Dictionary of lists to (flattened) List - c#

Related

How can I write list.GroupBy(x => x.AccountNumber).Select(g => g.First()) without LINQ

Performance between check exists before add to list and distinct in linq

SortedHashTable in c#

Is there a LINQ method to join/concat an unknown number of lists?

Efficiently deleting item from within 'foreach'

Categories

Resources