c# ToDictionary with ContainsKey check - c#

I have a list that I want to put in a dictionary, for simplicity the values being inserted will all be the same.
I can use a foreach loop.
List<string> list = new List<string>();
list.Add("Earth");
list.Add("Wind");
list.Add("Fire");
list.Add("Water");
list.Add("Water"); // Will NOT BE INSERTED using the foreach loop
var myDictionary= new Dictionary<string, int>();
foreach (string value in list)
{
if (!myDictionary.ContainsKey(value))
{
myDictionary.Add(value, 1);
}
}
The above works.
But I want to use ToDictionary do the same in the following way -
Dictionary<string, int> myDictionary2 = list.ToDictionary(i => i, i => 1);
Of course this fails because I'm adding "Water" twice.
What is the correct way of checking for duplicate entries when using ToDictionary?

You could use Distinct() to filter out duplicates:
Dictionary<string, int> myDictionary2 = list.Distinct().ToDictionary(i => i, i => 1);
The same approach would make your traditional loop much clearer too, since you don't have to check "manually" for duplicates:
foreach (string value in list.Distinct())
{
myDictionary.Add(value, 1);
}

Distinct is one option that avoids the duplicate key issue. If you need a count of duplicates, you might try something more like this GroupBy as follows:
var dict = list.GroupBy(i => i).ToDictionary(g => g.Key, g => g.Count());
If your application is not just a simple string-list/duplicate-count structure, you might get some mileage from choosing a different structure like a Lookup that you can get from calling the ToLookup extension -or possibly going with a Grouping like the GroupBy I used above.

Related

Check if keys exists in dictionary

I have a legacy code which converts lists to dictionary by doing some manipulation as shown below
var items = await processTask;
var itemDict = items.ToDictionary(dto => dto.ClientId, dto => mapper.ConvertTo(dto, hj));
But recently we started to see this issue which looks like we are getting duplicate keys
An item with the same key has already been added
What's the best way to fix this so that if duplicate keys comes it should not throw exception but we can log it. Can we do this in linq or it has to be done in for loop?
Unfortunately, you can't eliminate dups while in ToDictionary. You have to do something before ToDictionary to eliminate it, like call Distinct or similar. But may be better to have an explicit loop, where you get opportunity to do something with a dupe
var dict = new Dictionary<int, string>(); // whatever mapper converts to
foreach(var dto in items)
{
if (dict.ContainsKey(dto.ClientId))
{
// log duplicate here or do something
continue;
}
dict.Add(dto.ClientId, mapper.ConvertTo(dto, hj));
}
You can do it with LINQ using GroupBy:
var itemDict = items
.GroupBy(dto => dto.ClientId)
.ToDictionary(gr => gr.Key, gr => mapper.ConvertTo(gr.First(), hj))
Also logging duplicates will make this code less elegant.

Flatten a Dictionary<int, List<object>>

I have a dictionary which has an integer Key that represents a year, and a Value which is a list of object Channel. I need to flatten the data and create a new object from it.
Currently, my code looks like this:
Dictionary<int, List<Channel>> myDictionary;
foreach(var x in myDictionary)
{
var result = (from a in x.Value
from b in anotherList
where a.ChannelId == b.ChannelId
select new NewObject
{
NewObjectYear = x.Key,
NewObjectName = a.First().ChannelName,
}).ToList();
list.AddRange(result);
}
Notice that I am using the Key to be the value of property NewObjectYear.
I want to get rid of foreach since the dictionary contains a lot of data and doing some joins inside the iteration makes it very slow. So I decided to refactor and came up with this:
var flatten = myDictionary.SelectMany(x => x.Value.Select(y =>
new KeyValuePair<int, Channel>(x.Key, y))).ToList();
But with this, I couldn't get the Key directly. Using something like flatten.Select(x => x.Key) is definitely not the correct way. So I tried finding other ways to flatten that would be favorable for my scenario but failed. I also thought about creating a class which will contain the year and the list from the flattened but I don't know how.
Please help me with this.
Also, is there also another way that doesn't have the need to create a new class?
It seems to me you are trying to do only filtering, you do not need join for that:
var anotherListIDs = new HashSet<int>(anotherList.Select(c => c.ChannelId));
foreach (var x in myDictionary)
{
list.AddRange(x.Value
.Where(c => anotherListIDs.Contains(c.ChannelId))
.Select(c => new NewObject
{
NewObjectYear = x.Key,
NewObjectName = c.First().ChannelName,
}));
}
You do realise, that if the second element of the list in a specific dictionary element has a matching channelId, that you return the first element of this list, don't you?
var otherList = new OtherItem[]
{
new OtherItem() {ChannelId = 1, ...}
}
var dictionary = new Dictionary<int, List<Channel>[]
{
{ 10, // Key
new List<Channel>() // Value
{
new Channel() {ChannelId = 100, Name = "100"},
new Channel() {ChannelId = 1, Name = "1"},
},
};
Although the 2nd element has a matching ChannelId, you return the Name of the first element.
Anyway, let's assume this is what you really want. You are right, your function isn't very efficient.
Your dictionary implements IEnumerable<KeyValuePair<int, List<Channel>>. Therefore every x in your foreach is a KeyValuePair<int, List<Channel>. Every x.Value is a List<Channel>.
So for every element in your dictionary (which is a KeyValuePair<int, List<Channel>), you take the complete list, and perform a full inner join of the complete list with otherList, and for the result you take the key of the KeyValuePair and the first element of the List in the KeyValuePair.
And even though you might not use the complete result, but only the first or the first few, because of FirstOrDefault(), or Take(3), you do this for every element of every list in your Dictionary.
Indeed your query could be much more efficient.
As you use the ChannelIds in your OtherList only to find out if it is present, one of the major improvements would be to convert the ChannelIds of OtherList to a HashSet<int> where you have superior fast lookup to check if the ChannelId of one of the values in your Dictionary is in the HashSet.
So for every element in your dictionary, you only have to check every ChannelId in the list to see if one of them is in the HashSet. As soon as you've found one, you can stop and return only the first element of the List and the Key.
My solution is an extension function of Dictionary>. See Extension Methods Demystified
public static IEnumerable<NewObject> ExtractNewObjects(this Dictionary<int, List<Channel>> dictionary,
IEnumerable<OtherItem> otherList)
{
// I'll only use the ChannelIds of the otherList, so extract them
IEnumerable<int> otherChannelIds = otherList
.Select(otherItem => otherItem.ChannelId);
return dictionary.ExtractNewObjects(otherChannelIds);
}
This calls the other ExtractNewobjects:
public static IEnumerable<NewObject> ExtractNewObjects(this Dictionary<int, List<Channel>> dictionary,
IEnumerable<int> otherChannelIds)
{
var channelIdsSet = new HashSet<int>(otherChannelIds));
// duplicate channelIds will be removed automatically
foreach (KeyValuePair<int, List<Channel>> keyValuePair in dictionary)
{
// is any ChannelId in the list also in otherChannelIdsSet?
// every keyValuePair.Value is a List<Channel>
// every Channel has a ChannelId
// channelId found if any of these ChannelIds in in the HashSet
bool channelIdFound = keyValuePair.Value
.Any(channel => otherChannelIdsSet.Contains(channel.ChannelId);
if (channelIdFound)
{
yield return new NewObject()
{
NewObjectYear = keyValuePair.Key,
NewObjectName = keyValuePair.Value
.Select(channel => channel.ChannelName)
.FirstOrDefault(),
};
}
}
}
usage:
IEnumerable<OtherItem> otherList = ...
Dictionary<int, List<Channel>> dictionary = ...
IEnumerable<Newobject> extractedNewObjects = dictionary.ExtractNewObjects(otherList);
var someNewObjects = extractedNewObjects
.Take(5) // here we see the benefit from the yield return
.ToList();
We can see four efficiency improvements:
the use of HashSet<int> enables a very fast lookup to see if the ChannelId is in OtherList
the use of Any() stops enumerating the List<Channel> as soon as we've found a matching Channelid in the HashSet
the use of yield return makes that you don't enumerate over more elements in your Dictionary than you'll actually use.
The use of Select and FirstOrDefault when creating NewObjectName prevents exceptions if List<Channel> is empty

List<T> Concatenation dynamically

I am trying to concate List<> as follows-
List<Student> finalList = new List<Student>();
var sortedDict = dictOfList.OrderBy(k => k.Key);
foreach (KeyValuePair<int, List<Student>> entry in sortedDict) {
List<Student> ListFromDict = (List<Student>)entry.Value;
finalList.Concat(ListFromDict);
}
But no concatenation happens. finalList remains empty. Any help?
A call to Concat does not modify the original list, instead it returns a new list - or to be totally accurate: it returns an IEnumerable<string> that will produce the contents of both lists concatenated, without modifying either of them.
You probably want to use AddRange which does what you want:
List<Student> ListFromDict = (List<Student>)entry.Value;
finalList.AddRange(ListFromDict);
Or even shorter (in one line of code):
finalList.AddRange((List<Student>)entry.Value);
And because entry.Value is already of type List<Student>, you can use just this:
finalList.AddRange(entry.Value);
Other answers have explained why Concat isn't helping you - but they've all kept your original loop. There's no need for that - LINQ has you covered:
List<Student> finalList = dictOfList.OrderBy(k => k.Key)
.SelectMany(pair => pair.Value)
.ToList();
To be clear, this replaces the whole of your existing code, not just the body of the loop.
Much simpler :) Whenever you find yourself using a foreach loop which does nothing but build another collection, it's worth seeing whether you can eliminate that loop using LINQ.
You may want to read up the documentation on Enumerable.Concat:
Return Value
Type: System.Collections.Generic.IEnumerable
An IEnumerable that contains the concatenated elements of the two input sequences.
So you may want to use the return value, which holds the new elements.
As an alternative, you can use List.AddRange, which Adds the elements of the specified collection to the end of the List.
As an aside, you can also achieve your goal with a simple LINQ query:
var finalList = dictOfList.OrderBy(k => k.Key)
.SelectMany(k => k.Value)
.ToList();
As specified here, Concat generates a new sequence whereas AddRange actually adds the elements to the list. You thus should rewrite it to:
List<Student> finalList = new List<Student>();
var sortedDict = dictOfList.OrderBy(k => k.Key);
foreach (KeyValuePair<int, List<Student>> entry in sortedDict) {
List<Student> ListFromDict = (List<Student>)entry.Value;
finalList.AddRange(ListFromDict);
}
Furthermore you can improve the efficiency a bit, by omitting the cast to a List<T> object since entry.Value is already a List<T> (and technically only needs to be an IEnumerable<T>):
var sortedDict = dictOfList.OrderBy(k => k.Key);
foreach (KeyValuePair<int, List<Student>> entry in sortedDict) {
finalList.AddRange(entry.Value);
}
Concat method does not modify original collection, instead it returns brand new collection with concatenation result. So, either try finalList = finalList.Concat(ListFromDict) or use AddRange method which modifies target list.

Performance Improvement Tips for ForEach loop in C#?

I need to optimize the below foreach loop. The foreach loop is taken more time to get the unique items.
Instead can the FilterItems be converted into a list collection. If so how to do it. Then i will take unique items easily from it.
The problem arises when i have 5,00,000 items in FilterItems.
Please suggest some ways to optimize the below code:
int i = 0;
List<object> order = new List<object>();
List<object> unique = new List<object>();
// FilterItems IS A COLLECTION OF RECORDS. CAN THIS BE CONVERTED TO A LIST COLLECTION DIRECTLY, SO THAT I CAN TAKE THE UNIQUE ITEMS FROM IT.
foreach (Record rec in FilterItems)
{
string text = rec.GetValue(“Column Name”);
int position = order.BinarySearch(text);
if (position < 0)
{
order.Insert(-position - 1, text);
unique.Add(text);
}
i++;
}
It's unclear what you mean by "converting FilterItems into a list" when we don't know anything about it, but you could definitely consider sorting after you've got all the items, rather than as you go:
var strings = FilterItems.Select(record => record.GetValue("Column Name"))
.Distinct()
.OrderBy(x => x)
.ToList();
The use of Distinct() here will avoid sorting lots of equal items - it looks like you only want distinct items anyway.
If you want unique to be in the original order but order to be the same items, just sorted, you could use:
var unique = FilterItems.Select(record => record.GetValue("Column Name"))
.Distinct()
.ToList();
var order = unique.OrderBy(x => x).ToList();
Now Distinct() isn't guaranteed to preserve order - but it does so in the current implementation, and that's the most natural implementation, too.

C# remove duplicate dictionary items from List<>?

So I a collection of dictionary items in a list:
List<Dictionary<string, string>> inputData = new List<Dictionary<string, string>>(inputs);
List<Dictionary<string, string>> itemStack = new List<Dictionary<string, string>>();
Now what I want to do is for each inputData dictionary item I want to check if itemStack has the same value (Dictionary Item) already.
I was thinking it would be like?
foreach (var item in inputData)
{
if(!itemStack.Contains(item){ itemStack.Add(item)}
else{ //Duplicate found}
}
It doesn't really check the items values inside? It just assumes that it doesn't have it...
All i want is if itemStack contains and item that is already in the stack don't include it.
I know I'm missing something obvious.
Thanks,
Dictionary is reference type, so it doesn't check the "deep" value like you expected.
You will have to write your own "Contains" method, either as totally separate method or extension of the Dictionary itself then use it instead, for example:
if(!MyContains(itemStack, item)){ itemStack.Add(item)}
True that a HashSet would be better, but if you want to do it here, try this (assuming you are filtering duplicate keys only):
foreach (var item in inputData.Keys)
{
if (itemStack.Where(x => x.Key == item.Key).Count() > 0)
// There was a duplicate
}
Or, if you only care when the data is coming out you can call:
itemStack.Distinct()
I think, your way is right. On my mind, HashSet is good, but when you add a new element, it performs the same test on the contents of the same items.
Regards.
Based on your initial problem statement, you might do something like this:
var aggregateKnownKeys = itemStack.SelectMany(d => d.Keys);
itemStack.AddRange(
inputData.Select(d=> d.Where(p => !aggregateKnownKeys.Contains(p.Key))
.ToDictionary(p => p.Key, p => p.Value)));
If you only need to combine two dictionaries then you could do this to skip keys that exist in itemStack:
var inputData = new Dictionary<string, string>();
var itemStack = new Dictionary<string, string>();
var oldStack = itemStack;
itemStack = new[] { inputData.SkipWhile(d => oldStack.Keys.Contains(d.Key)), itemStack }
.SelectMany(d => d)
.ToDictionary(d => d.Key, d => d.Value);
Okay so this isn't quite a full answer but it's what I did.
So I have a List of items and instead of doing a full compare to whats in an List(Hence the other considered) I just did a single item check:
if(!String.IsNullOrEmpty(item["itemId"]))
{
alert.DaleksApproaching(item["itemId"]);
}
So when it does see it has a value it just does another event to get rid of it.
The idea of using LINQ and the method approaches about(Contains and Distinct)I like. I have yet to try that, but I plan on doing that. For this it doesn't use LINQ :(
Thanks everyone!

Categories

Resources