Converting this List of Lists to Dictionary using LINQ

Converting this List of Lists to Dictionary using LINQ - c#

I have a class :
public class Client
{
public Client()
{
TemplateKeys = new List<int>();
}
public List<int> TemplateKeys { get; set; }
}
Then I create say 3 instances :
List<Client> clients = new List<Client>();
Client client = new Client();
client.TemplateKeys.Add(1);
client.TemplateKeys.Add(2);
client.TemplateKeys.Add(3);
clients.Add(client);
//..
Client client1 = new Client();
client1.TemplateKeys.Add(1);
client1.TemplateKeys.Add(3);
clients.Add(client1);
//..
Client client2 = new Client();
client2.TemplateKeys.Add(2);
client2.TemplateKeys.Add(4);
clients.Add(client2);
Then I create a Dictionary:
Dictionary<int, string> templatesInUse = new Dictionary<int, string>();
So what I want to do is take the TemplateKeys used by the users in this clients List, to Distinct() them and them as keys of the templatesInUse Dictionary where the value for now will be string.Empty. The idea is that once I have the keys, I'm gonna query the database for the text associated to each key in the dicitionary. Then i'm gonna replace the string.Empty value witht he result from the database and I'll be able to use the templates for each user without having to query the database for the same template many times.
So what I've done was first to try to extract the distinct values which I managed to do like so:
List<int> res = clients.SelectMany(cl => cl.TemplateKeys)
.Distinct()
.ToList();
Now I want to actually make this LINQ expression return the desired Dictionary<int, string> result. I see the LINQ has built in ToDictionary() extension method but I couldn't find a way to get my result by replacing ToList() with ToDictionary() like so :
templatesInUse = clients.SelectMany(cl => cl.TemplateKeys)
.Distinct()
.ToDictionary(//tried some things here with no success);
So i saw that almost all examples with ToDictionary uses GroupBy() even though I don't need grouping and I would like to see solution that doesn't use it I remade my LINQ like so:
templatesInUse = clients.SelectMany(cl => cl.TemplateKeys)
.Distinct()
.GroupBy(t => t)
.ToDictionary(g => g.Key, g.ToString());
which works to some extenct but instead my desired string.Empty or just "" value I get something strange for value which would work in theory since those values will be replaced but still I would like to get a clean by which I mean that after the execution of the LINQ query I would like to get the TemplateKey as my Dictionary key and empty string as my value. And as I mentioned I really wonder and would like to see a way without using GroupBy() is that a must when using ToDictionary()?

You don't need grouping. Just specify the key as the number and value as string.Empty.
templatesInUse = clients.SelectMany(cl => cl.TemplateKeys).Distinct()
.ToDictionary(x => x, x => string.Empty);

Related

Flatten a sequence of sequences into a single sequence (List<string> from List<Object> contains that List<string>)

I'm trying to extract some Lists of strings into a single List.
First I have this class
public class Client
{
public string Name { get; set; }
public List<string> ApiScopes { get; set; }
}
Thus I'm getting my response as a List<Client> and my intention is to take all Client's Scopes into a single List without Looping
I've tried this by LINQ:
var x = clients.Select(c=> c.AllowedScopes.Select(x => x).ToList()).ToList();
this returns a List<List<string>>, and I just want to get all this into one Distinct list of strings.

It sounds like you want SelectMany (which flattens a sequence of sequences into a single sequence), with Distinct as well if you need a list with each scope only appearing once even if it's present for multiple clients:
var scopes = clients.SelectMany(c => c.ApiScopes).Distinct().ToList();
This assumes that Client.ApiScopes is never null. If it might be null, you need a little bit more work:
var scopes = clients
.SelectMany(c => ((IEnumerable<string>) c.ApiScopes) ?? Enumerable.Empty<string>())
.Distinct()
.ToList();

You can use SelectMany to flatten results :
var scopes=clients.SelectMany(client=>client.ApiScopes)
.Distinct()
.ToList();
This is equivalent to :
var scopes= ( from client in clients
from scope in client.ApiScopes
select scope
)
.Distinct()
.ToList();

Flatten a Dictionary<int, List<object>>

I have a dictionary which has an integer Key that represents a year, and a Value which is a list of object Channel. I need to flatten the data and create a new object from it.
Currently, my code looks like this:
Dictionary<int, List<Channel>> myDictionary;
foreach(var x in myDictionary)
{
var result = (from a in x.Value
from b in anotherList
where a.ChannelId == b.ChannelId
select new NewObject
{
NewObjectYear = x.Key,
NewObjectName = a.First().ChannelName,
}).ToList();
list.AddRange(result);
}
Notice that I am using the Key to be the value of property NewObjectYear.
I want to get rid of foreach since the dictionary contains a lot of data and doing some joins inside the iteration makes it very slow. So I decided to refactor and came up with this:
var flatten = myDictionary.SelectMany(x => x.Value.Select(y =>
new KeyValuePair<int, Channel>(x.Key, y))).ToList();
But with this, I couldn't get the Key directly. Using something like flatten.Select(x => x.Key) is definitely not the correct way. So I tried finding other ways to flatten that would be favorable for my scenario but failed. I also thought about creating a class which will contain the year and the list from the flattened but I don't know how.
Please help me with this.
Also, is there also another way that doesn't have the need to create a new class?

It seems to me you are trying to do only filtering, you do not need join for that:
var anotherListIDs = new HashSet<int>(anotherList.Select(c => c.ChannelId));
foreach (var x in myDictionary)
{
list.AddRange(x.Value
.Where(c => anotherListIDs.Contains(c.ChannelId))
.Select(c => new NewObject
{
NewObjectYear = x.Key,
NewObjectName = c.First().ChannelName,
}));
}

You do realise, that if the second element of the list in a specific dictionary element has a matching channelId, that you return the first element of this list, don't you?
var otherList = new OtherItem[]
{
new OtherItem() {ChannelId = 1, ...}
}
var dictionary = new Dictionary<int, List<Channel>[]
{
{ 10, // Key
new List<Channel>() // Value
{
new Channel() {ChannelId = 100, Name = "100"},
new Channel() {ChannelId = 1, Name = "1"},
},
};
Although the 2nd element has a matching ChannelId, you return the Name of the first element.
Anyway, let's assume this is what you really want. You are right, your function isn't very efficient.
Your dictionary implements IEnumerable<KeyValuePair<int, List<Channel>>. Therefore every x in your foreach is a KeyValuePair<int, List<Channel>. Every x.Value is a List<Channel>.
So for every element in your dictionary (which is a KeyValuePair<int, List<Channel>), you take the complete list, and perform a full inner join of the complete list with otherList, and for the result you take the key of the KeyValuePair and the first element of the List in the KeyValuePair.
And even though you might not use the complete result, but only the first or the first few, because of FirstOrDefault(), or Take(3), you do this for every element of every list in your Dictionary.
Indeed your query could be much more efficient.
As you use the ChannelIds in your OtherList only to find out if it is present, one of the major improvements would be to convert the ChannelIds of OtherList to a HashSet<int> where you have superior fast lookup to check if the ChannelId of one of the values in your Dictionary is in the HashSet.
So for every element in your dictionary, you only have to check every ChannelId in the list to see if one of them is in the HashSet. As soon as you've found one, you can stop and return only the first element of the List and the Key.
My solution is an extension function of Dictionary>. See Extension Methods Demystified
public static IEnumerable<NewObject> ExtractNewObjects(this Dictionary<int, List<Channel>> dictionary,
IEnumerable<OtherItem> otherList)
{
// I'll only use the ChannelIds of the otherList, so extract them
IEnumerable<int> otherChannelIds = otherList
.Select(otherItem => otherItem.ChannelId);
return dictionary.ExtractNewObjects(otherChannelIds);
}
This calls the other ExtractNewobjects:
public static IEnumerable<NewObject> ExtractNewObjects(this Dictionary<int, List<Channel>> dictionary,
IEnumerable<int> otherChannelIds)
{
var channelIdsSet = new HashSet<int>(otherChannelIds));
// duplicate channelIds will be removed automatically
foreach (KeyValuePair<int, List<Channel>> keyValuePair in dictionary)
{
// is any ChannelId in the list also in otherChannelIdsSet?
// every keyValuePair.Value is a List<Channel>
// every Channel has a ChannelId
// channelId found if any of these ChannelIds in in the HashSet
bool channelIdFound = keyValuePair.Value
.Any(channel => otherChannelIdsSet.Contains(channel.ChannelId);
if (channelIdFound)
{
yield return new NewObject()
{
NewObjectYear = keyValuePair.Key,
NewObjectName = keyValuePair.Value
.Select(channel => channel.ChannelName)
.FirstOrDefault(),
};
}
}
}
usage:
IEnumerable<OtherItem> otherList = ...
Dictionary<int, List<Channel>> dictionary = ...
IEnumerable<Newobject> extractedNewObjects = dictionary.ExtractNewObjects(otherList);
var someNewObjects = extractedNewObjects
.Take(5) // here we see the benefit from the yield return
.ToList();
We can see four efficiency improvements:
the use of HashSet<int> enables a very fast lookup to see if the ChannelId is in OtherList
the use of Any() stops enumerating the List<Channel> as soon as we've found a matching Channelid in the HashSet
the use of yield return makes that you don't enumerate over more elements in your Dictionary than you'll actually use.
The use of Select and FirstOrDefault when creating NewObjectName prevents exceptions if List<Channel> is empty

How to store the result of a linq query in a KeyDictionary variable

So I have a collection of objects who have multiple properties, two of these are groupname and personname. Now I need to count in the collection how many of each object belong to a certain group and person. So in other words, I need to group by groupname, then personname and then count how many objects have this combination. First I created this
public MultiKeyDictionary<string, string, int> GetPersonsPerGroup(IEnumerable<Home> homes ,List<string> gr, List<string> na)
{
List<string> groups = gr;
groups.Add("");
List<string> names = na;
names.Add("");
List<Home> Filtered = homes.ToList();
Filtered.ForEach(h => h.RemoveNull());
var result = new MultiKeyDictionary<string, string, int>();
int counter1 = 0;
foreach (var g in groups)
{
int counter2 = 0;
foreach (var n in names)
{
int counter3 = 0;
foreach (Home h in Filtered)
{
if (h.GroupName == g && h.PersonName == n)
{
counter3++;
if (counter3 > 100)
break;
}
}
if (counter3 > 0)
{
result.Add(g,n,counter3);
}
counter2++;
}
counter1++;
}
Which may look good, but the problem is that the "home" parameter can contain more than 10000 objects, with more than 1500 unique names and around 200 unique groups. Which causes this to iterate like a billion times really slowing my program down. So I need an other way of handling this. Which made me decide to try using linq. Which led to this creation:
var newList = Filtered.GroupBy(x => new { x.GroupName, x.PersonName })
.Select(y => (MultiKeyDictionary<string, string, int>)result.Add(y.Key.GroupName, y.Key.PersonName, y.ToList().Count));
Which gives an error "Cannot convert type 'void' to 'MultiKeyDictionary<string,string,int>' and I have no idea how to solve it. How can I make it so that the result of this query gets stored all in one MultikeyDictionary without having to iterate over each possible combination and counting all of them.
Some information:
MultiKeyDictionary is a class I defined (something I found on here actually), it's just a normal dictionary but with two keys assosiated to one value.
The RemoveNull() method on the Home object makes sure that all the properties of the Home object are not null. If it is the case the value gets sets to something not null ("null", basic date, 0, ...).
The parameters are:
homes = a list of Home objects received from an other class
gr = a list of all the unique groups in the list of homes
na = a list of all the unique names in the list of homes
The same name can occur on different groups
Hopefully someone can help me get further!
Thanks in advance!

Select must return something. You are not returning but only adding to an existing list. Do this instead:
var newList = Filtered.GroupBy(x => new { x.GroupName, x.PersonName }):
var result = new MultiKeyDictionary<string, string, int>);
foreach(var y in newList)
{
result.Add(y.Key.GroupName, y.Key.PersonName, y.ToList().Count));
}
The reason you are getting error below:
"Cannot convert type 'void' to 'MultiKeyDictionary'
is because you are trying to cast the returned value from Add which is void to MultiKeyDictionary<string,string,int> which clearly cannot be done.

If MultiKeyDictionary requires the two keys to match in order to find a result, then you might want to just use a regular Dictionary with a Tuple as a composite type. C# 7 has features that make this pretty easy:
public Dictionary<(string, string), int> GetPersonsPerGroup(IEnumerable<Home> homes ,List<string> gr, List<string> na)
{
return Filtered.GroupBy(x => (x.GroupName, x.PersonName))
.ToDictionary(g => g.Key, g => g.Count);
}
You can even associate optional compile-time names with your tuple's values, by declaring it like this: Dictionary<(string groupName, string personName), int>.

Your grouping key anonymous object should work fine as a standard Dictionary key, so no reason to create a new type of Dictionary unless it offers special access via single keys, so just convert the grouping to a standard Dictionary:
var result = Filtered.GroupBy(f => new { f.GroupName, f.PersonName })
.ToDictionary(fg => fg.Key, fg => fg.Count());

Making a list distinct in C#

In C#, I have an object type 'A' that contains a list of key value pairs.
The key value pairs is a category string and a value string.
To instantiate object type A, I would have to do the following:
List<KeyValuePair> keyValuePairs = new List<KeyValuePair>();
keyValuePairs.Add(new KeyValuePair<"Country", "U.S.A">());
keyValuePairs.Add(new KeyValuePair<"Name", "Mo">());
keyValuePairs.Add(new KeyValuePair<"Age", "33">());
A a = new A(keyValuePairs);
Eventually, I will have a List of A object types and I want to manipulate the list so that i only get unique values and I base it only on the country name. Therefore, I want the list to be reduced to only have ONE "Country", "U.S.A", even if it appears more than once.
I was looking into the linq Distinct, but it does not do what I want because it I can't define any parameters and because it doesn't seem to be able to catch two equivalent objects of type A. I know that I can override the "Equals" method, but it still doesn't solve the my problem, which is to render the list distinct based on ONE of the key value pairs.

To expand upon Karl Anderson's suggestion of using morelinq, if you're unable to (or don't want to) link to another dll for your project, I implemented this myself awhile ago:
public static IEnumerable<T> DistinctBy<T, U>(this IEnumerable<T> source, Func<T, U>selector)
{
var contained = new Dictionary<U, bool>();
foreach (var elem in source)
{
U selected = selector(elem);
bool has;
if (!contained.TryGetValue(selected, out has))
{
contained[selected] = true;
yield return elem;
}
}
}
Used as follows:
collection.DistinctBy(elem => elem.Property);
In versions of .NET that support it, you can use a HashSet<T> instead of a Dictionary<T, Bool>, since we don't really care what the value is so much as that it has already been hashed.

Check out the DistinctBy syntax in the morelinq project.
A a = new A(keyValuePairs);
a = a.DistinctBy(k => new { k.Key, k.Value }).ToList();

You need to select the distinct property first:
Because it's a list inside a list, you can use the SelectMany. The SelectMany will concat the results of subselections.
List<A> listOfA = new List<A>();
listOfA.SelectMany(a => a.KeyValuePairs
.Where(keyValue => keyValue.Key == "Country")
.Select(keyValue => keyValue.Value))
.Distinct();
This should be it. It will select all values where the key is "Country" and concat the lists. Final it will distinct the country's. Given that the property KeyValuePairs of the class A is at least a IEnumerable< KeyValuePair< string, string>>

var result = keyValuePairs.GroupBy(x => x.Key)
.SelectMany(g => g.Key == "Country" ? g.Distinct() : g);

You can use the groupby statement. From here you can do all kind off cool stuf
listOfA.GroupBy(i=>i.Value)
You can groupby the value and then sum all the keys or something other usefull

Detecting "near duplicates" using a LINQ/C# query

I'm using the following queries to detect duplicates in a database.
Using a LINQ join doesn't work very well because Company X may also be listed as CompanyX, therefore I'd like to amend this to detect "near duplicates".
var results = result
.GroupBy(c => new {c.CompanyName})
.Select(g => new CompanyGridViewModel
{
LeadId = g.First().LeadId,
Qty = g.Count(),
CompanyName = g.Key.CompanyName,
}).ToList();
Could anybody suggest a way in which I have better control over the comparison? Perhaps via an IEqualityComparer (although I'm not exactly sure how that would work in this situation)
My main goals are:
To list the first record with a subset of all duplicates (or "near duplicates")
To have some flexibility over the fields and text comparisons I use for my duplicates.

For your explicit "ignoring spaces" case, you can simply call
var results = result.GroupBy(c => c.Name.Replace(" ", ""))...
However, in the general case where you want flexibility, I'd build up a library of IEqualityComparer<Company> classes to use in your groupings. For example, this should do the same in your "ignore space" case:
public class CompanyNameIgnoringSpaces : IEqualityComparer<Company>
{
public bool Equals(Company x, Company y)
{
return x.Name.Replace(" ", "") == y.Name.Replace(" ", "");
}
public int GetHashCode(Company obj)
{
return obj.Name.Replace(" ", "").GetHashCode();
}
}
which you could use as
var results = result.GroupBy(c => c, new CompanyNameIgnoringSpaces())...
It's pretty straightforward to do similar things containing multiple fields, or other definitions of similarity, etc.
Just note that your defintion of "similar" must be transitive, e.g. if you're looking at integers you can't define "similar" as "within 5", because then you'd have "0 is similar to 5" and "5 is similar to 10" but not "0 is similar to 10". (It must also be reflexive and symmetric, but that's more straightforward.)

Okay, so since you're looking for different permutations you could do something like this:
Bear in mind this was written in the answer so it may not fully compile, but you get the idea.
var results = result
.Where(g => CompanyNamePermutations(g.Key.CompanyName).Contains(g.Key.CompanyName))
.GroupBy(c => new {c.CompanyName})
.Select(g => new CompanyGridViewModel
{
LeadId = g.First().LeadId,
Qty = g.Count(),
CompanyName = g.Key.CompanyName,
}).ToList();
private static List<string> CompanyNamePermutations(string companyName)
{
// build your permutations here
// so to build the one in your example
return new List<string>
{
companyName,
string.Join("", companyName.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
};
}

In this case you need to define where the work is going to take place i.e. fully on the server, in local memory or a mixture of both.
In local memory:
In this case we have two routes, to pull back all the data and just do the logic in local memory, or to stream the data and apply the logic piecewise. To pull all the data just ToList() or ToArray() the base table. To stream the data would suggest using ToLookup() with custom IEqualityComparer, e.g.
public class CustomEqualityComparer: IEqualityComparer<String>
{
public bool Equals(String str1, String str2)
{
//custom logic
}
public int GetHashCode(String str)
{
// custom logic
}
}
//result
var results = result.ToLookup(r => r.Name,
new CustomEqualityComparer())
.Select(r => ....)
Fully on the server:
Depends on your provider and what it can successfully map. E.g. if we define a near duplicate as one with an alternative delimiter one could do something like this:
private char[] delimiters = new char[]{' ','-','*'}
var results = result.GroupBy(r => delimiters.Aggregate( d => r.Replace(d,'')...
Mixture:
In this case we are splitting the work between the two. Unless you come up with a nice scheme this route is most likely to be inefficient. E.g. if we keep the logic on the local side, build groupings as a mapping from a name into a key and just query the resulting groupings we can do something like this:
var groupings = result.Select(r => r.Name)
//pull into local memory
.ToArray()
//do local grouping logic...
//Query results
var results = result.GroupBy(r => groupings[r]).....
Personally I usually go with the first option, pulling all the data for small data sets and streaming large data sets (empirically I found streaming with logic between each pull takes a lot longer than pulling all the data then doing all the logic)
Notes: Dependent on the provider ToLookup() is usually immediate execution and in construction applies its logic piecewise.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Converting this List of Lists to Dictionary using LINQ - c#

You don't need grouping. Just specify the key as the number and value as string.Empty. templatesInUse = clients.SelectMany(cl => cl.TemplateKeys).Distinct() .ToDictionary(x => x, x => string.Empty);

Related

Flatten a sequence of sequences into a single sequence (List<string> from List<Object> contains that List<string>)

Flatten a Dictionary<int, List<object>>

How to store the result of a linq query in a KeyDictionary variable

Making a list distinct in C#

Detecting "near duplicates" using a LINQ/C# query

Categories

Resources