Making a list distinct in C#

Making a list distinct in C# - c#

In C#, I have an object type 'A' that contains a list of key value pairs.
The key value pairs is a category string and a value string.
To instantiate object type A, I would have to do the following:
List<KeyValuePair> keyValuePairs = new List<KeyValuePair>();
keyValuePairs.Add(new KeyValuePair<"Country", "U.S.A">());
keyValuePairs.Add(new KeyValuePair<"Name", "Mo">());
keyValuePairs.Add(new KeyValuePair<"Age", "33">());
A a = new A(keyValuePairs);
Eventually, I will have a List of A object types and I want to manipulate the list so that i only get unique values and I base it only on the country name. Therefore, I want the list to be reduced to only have ONE "Country", "U.S.A", even if it appears more than once.
I was looking into the linq Distinct, but it does not do what I want because it I can't define any parameters and because it doesn't seem to be able to catch two equivalent objects of type A. I know that I can override the "Equals" method, but it still doesn't solve the my problem, which is to render the list distinct based on ONE of the key value pairs.

To expand upon Karl Anderson's suggestion of using morelinq, if you're unable to (or don't want to) link to another dll for your project, I implemented this myself awhile ago:
public static IEnumerable<T> DistinctBy<T, U>(this IEnumerable<T> source, Func<T, U>selector)
{
var contained = new Dictionary<U, bool>();
foreach (var elem in source)
{
U selected = selector(elem);
bool has;
if (!contained.TryGetValue(selected, out has))
{
contained[selected] = true;
yield return elem;
}
}
}
Used as follows:
collection.DistinctBy(elem => elem.Property);
In versions of .NET that support it, you can use a HashSet<T> instead of a Dictionary<T, Bool>, since we don't really care what the value is so much as that it has already been hashed.

Check out the DistinctBy syntax in the morelinq project.
A a = new A(keyValuePairs);
a = a.DistinctBy(k => new { k.Key, k.Value }).ToList();

You need to select the distinct property first:
Because it's a list inside a list, you can use the SelectMany. The SelectMany will concat the results of subselections.
List<A> listOfA = new List<A>();
listOfA.SelectMany(a => a.KeyValuePairs
.Where(keyValue => keyValue.Key == "Country")
.Select(keyValue => keyValue.Value))
.Distinct();
This should be it. It will select all values where the key is "Country" and concat the lists. Final it will distinct the country's. Given that the property KeyValuePairs of the class A is at least a IEnumerable< KeyValuePair< string, string>>

var result = keyValuePairs.GroupBy(x => x.Key)
.SelectMany(g => g.Key == "Country" ? g.Distinct() : g);

You can use the groupby statement. From here you can do all kind off cool stuf
listOfA.GroupBy(i=>i.Value)
You can groupby the value and then sum all the keys or something other usefull

Related

Flatten a Dictionary<int, List<object>>

I have a dictionary which has an integer Key that represents a year, and a Value which is a list of object Channel. I need to flatten the data and create a new object from it.
Currently, my code looks like this:
Dictionary<int, List<Channel>> myDictionary;
foreach(var x in myDictionary)
{
var result = (from a in x.Value
from b in anotherList
where a.ChannelId == b.ChannelId
select new NewObject
{
NewObjectYear = x.Key,
NewObjectName = a.First().ChannelName,
}).ToList();
list.AddRange(result);
}
Notice that I am using the Key to be the value of property NewObjectYear.
I want to get rid of foreach since the dictionary contains a lot of data and doing some joins inside the iteration makes it very slow. So I decided to refactor and came up with this:
var flatten = myDictionary.SelectMany(x => x.Value.Select(y =>
new KeyValuePair<int, Channel>(x.Key, y))).ToList();
But with this, I couldn't get the Key directly. Using something like flatten.Select(x => x.Key) is definitely not the correct way. So I tried finding other ways to flatten that would be favorable for my scenario but failed. I also thought about creating a class which will contain the year and the list from the flattened but I don't know how.
Please help me with this.
Also, is there also another way that doesn't have the need to create a new class?

It seems to me you are trying to do only filtering, you do not need join for that:
var anotherListIDs = new HashSet<int>(anotherList.Select(c => c.ChannelId));
foreach (var x in myDictionary)
{
list.AddRange(x.Value
.Where(c => anotherListIDs.Contains(c.ChannelId))
.Select(c => new NewObject
{
NewObjectYear = x.Key,
NewObjectName = c.First().ChannelName,
}));
}

You do realise, that if the second element of the list in a specific dictionary element has a matching channelId, that you return the first element of this list, don't you?
var otherList = new OtherItem[]
{
new OtherItem() {ChannelId = 1, ...}
}
var dictionary = new Dictionary<int, List<Channel>[]
{
{ 10, // Key
new List<Channel>() // Value
{
new Channel() {ChannelId = 100, Name = "100"},
new Channel() {ChannelId = 1, Name = "1"},
},
};
Although the 2nd element has a matching ChannelId, you return the Name of the first element.
Anyway, let's assume this is what you really want. You are right, your function isn't very efficient.
Your dictionary implements IEnumerable<KeyValuePair<int, List<Channel>>. Therefore every x in your foreach is a KeyValuePair<int, List<Channel>. Every x.Value is a List<Channel>.
So for every element in your dictionary (which is a KeyValuePair<int, List<Channel>), you take the complete list, and perform a full inner join of the complete list with otherList, and for the result you take the key of the KeyValuePair and the first element of the List in the KeyValuePair.
And even though you might not use the complete result, but only the first or the first few, because of FirstOrDefault(), or Take(3), you do this for every element of every list in your Dictionary.
Indeed your query could be much more efficient.
As you use the ChannelIds in your OtherList only to find out if it is present, one of the major improvements would be to convert the ChannelIds of OtherList to a HashSet<int> where you have superior fast lookup to check if the ChannelId of one of the values in your Dictionary is in the HashSet.
So for every element in your dictionary, you only have to check every ChannelId in the list to see if one of them is in the HashSet. As soon as you've found one, you can stop and return only the first element of the List and the Key.
My solution is an extension function of Dictionary>. See Extension Methods Demystified
public static IEnumerable<NewObject> ExtractNewObjects(this Dictionary<int, List<Channel>> dictionary,
IEnumerable<OtherItem> otherList)
{
// I'll only use the ChannelIds of the otherList, so extract them
IEnumerable<int> otherChannelIds = otherList
.Select(otherItem => otherItem.ChannelId);
return dictionary.ExtractNewObjects(otherChannelIds);
}
This calls the other ExtractNewobjects:
public static IEnumerable<NewObject> ExtractNewObjects(this Dictionary<int, List<Channel>> dictionary,
IEnumerable<int> otherChannelIds)
{
var channelIdsSet = new HashSet<int>(otherChannelIds));
// duplicate channelIds will be removed automatically
foreach (KeyValuePair<int, List<Channel>> keyValuePair in dictionary)
{
// is any ChannelId in the list also in otherChannelIdsSet?
// every keyValuePair.Value is a List<Channel>
// every Channel has a ChannelId
// channelId found if any of these ChannelIds in in the HashSet
bool channelIdFound = keyValuePair.Value
.Any(channel => otherChannelIdsSet.Contains(channel.ChannelId);
if (channelIdFound)
{
yield return new NewObject()
{
NewObjectYear = keyValuePair.Key,
NewObjectName = keyValuePair.Value
.Select(channel => channel.ChannelName)
.FirstOrDefault(),
};
}
}
}
usage:
IEnumerable<OtherItem> otherList = ...
Dictionary<int, List<Channel>> dictionary = ...
IEnumerable<Newobject> extractedNewObjects = dictionary.ExtractNewObjects(otherList);
var someNewObjects = extractedNewObjects
.Take(5) // here we see the benefit from the yield return
.ToList();
We can see four efficiency improvements:
the use of HashSet<int> enables a very fast lookup to see if the ChannelId is in OtherList
the use of Any() stops enumerating the List<Channel> as soon as we've found a matching Channelid in the HashSet
the use of yield return makes that you don't enumerate over more elements in your Dictionary than you'll actually use.
The use of Select and FirstOrDefault when creating NewObjectName prevents exceptions if List<Channel> is empty

List<T> Concatenation dynamically

I am trying to concate List<> as follows-
List<Student> finalList = new List<Student>();
var sortedDict = dictOfList.OrderBy(k => k.Key);
foreach (KeyValuePair<int, List<Student>> entry in sortedDict) {
List<Student> ListFromDict = (List<Student>)entry.Value;
finalList.Concat(ListFromDict);
}
But no concatenation happens. finalList remains empty. Any help?

A call to Concat does not modify the original list, instead it returns a new list - or to be totally accurate: it returns an IEnumerable<string> that will produce the contents of both lists concatenated, without modifying either of them.
You probably want to use AddRange which does what you want:
List<Student> ListFromDict = (List<Student>)entry.Value;
finalList.AddRange(ListFromDict);
Or even shorter (in one line of code):
finalList.AddRange((List<Student>)entry.Value);
And because entry.Value is already of type List<Student>, you can use just this:
finalList.AddRange(entry.Value);

Other answers have explained why Concat isn't helping you - but they've all kept your original loop. There's no need for that - LINQ has you covered:
List<Student> finalList = dictOfList.OrderBy(k => k.Key)
.SelectMany(pair => pair.Value)
.ToList();
To be clear, this replaces the whole of your existing code, not just the body of the loop.
Much simpler :) Whenever you find yourself using a foreach loop which does nothing but build another collection, it's worth seeing whether you can eliminate that loop using LINQ.

You may want to read up the documentation on Enumerable.Concat:
Return Value
Type: System.Collections.Generic.IEnumerable
An IEnumerable that contains the concatenated elements of the two input sequences.
So you may want to use the return value, which holds the new elements.
As an alternative, you can use List.AddRange, which Adds the elements of the specified collection to the end of the List.
As an aside, you can also achieve your goal with a simple LINQ query:
var finalList = dictOfList.OrderBy(k => k.Key)
.SelectMany(k => k.Value)
.ToList();

As specified here, Concat generates a new sequence whereas AddRange actually adds the elements to the list. You thus should rewrite it to:
List<Student> finalList = new List<Student>();
var sortedDict = dictOfList.OrderBy(k => k.Key);
foreach (KeyValuePair<int, List<Student>> entry in sortedDict) {
List<Student> ListFromDict = (List<Student>)entry.Value;
finalList.AddRange(ListFromDict);
}
Furthermore you can improve the efficiency a bit, by omitting the cast to a List<T> object since entry.Value is already a List<T> (and technically only needs to be an IEnumerable<T>):
var sortedDict = dictOfList.OrderBy(k => k.Key);
foreach (KeyValuePair<int, List<Student>> entry in sortedDict) {
finalList.AddRange(entry.Value);
}

Concat method does not modify original collection, instead it returns brand new collection with concatenation result. So, either try finalList = finalList.Concat(ListFromDict) or use AddRange method which modifies target list.

Dynamically create anonymous object from list values c#

I have a list (or can be array) of strings that I want to dynamically create an anonymous object from. How do I do this?
var dataSet = new DataSet();
dataSet.ReadXml(#"");
var dataTable = dataSet.Tables[0];
var dataRow = dataTable.Rows[0];
var keys = new List<string> {"Column1","Column2"};
var result = new {keys[0] = dataRow[keys[0]], keys[1] = dataRow[keys[1]]}
So that list named "keys" is going to be created outside this method and can contain 1 to many values. I tried creating a dictionary and looping through the list and adding key/value pairs to the dictionary but then I couldnt figure out how to convert the dictionary back to an anonymous type. I also experimented with the expando objects but that didn't seem to get me any farther.
I must be able to return an anonymous type as the result of this method will be using with the GroupBy clause of a LINQ query.
Here is the method I had to dynamically create the dictionary:
public object Key(DataRow dataRow, List<String> keys)
{
var dictionary = new IDictionary<string, object>;
foreach (string key in keys)
{
dictionary.Add(key, dataRow[key]);
}
return dictionary;
}
Here is my LINQ query:
var duplicates = dataTable.AsEnumerable().GroupBy(r => Key(r, keys)).Where(c => c.Count() > 1).ToList();
The GroupBy clause works if I hardcode in an anonymous type from the Key() method. Basically I just need the GroupBy clause to be dynamically set based upon the values in the keys list.

Stripping down your question, what you want is to be able to group a list of items based on a runtime property which could be composed of one or more properties of that item. In essence, it means you need a selector function (which is your Key method) that transforms an item into a key.
In order for GroupBy to work, it needs to be able to compare any two instances of the key to see if they're equal. This means the key needs to implement a meaningful Equals() method, or you need an IEqualityComparer implementation that does the work for you. In this case I wouldn't bother with creating a new Key, just write an Equality Comparer that can compare two DataRows directly:
var duplicates = dataTable
.AsEnumerable()
.GroupBy(r => r, new MyDataRowComparer(keys))
.Where(c => c.Count() > 1)
.ToList();
internal class MyDataRowComparer : IEqualityComparer<DataRow>
{
private readonly string[] _keys;
public MyDataRowComparer(string[] keys)
{
_keys = keys; // keep the keys to compare by.
}
public bool Equals(DataRow x, DataRow y)
{
// a simple implementation that checks if all the required fields
// match. This might need more work.
bool areEqual = true;
foreach (var key in _keys)
{
areEqual &= (x[key] == y[key]);
}
return areEqual;
}
public int GetHashCode(DataRow obj)
{
// Add implementation here to create an aggregate hashcode.
}
}

Need help understanding .Select method C#

I am having difficulties understandting what type of statement this is and how to use the .select method.
var lines = System.IO.File.ReadLines(#"c:\temp\mycsvfil3.csv")
.Select(l => new
{
myIdentiafication= int.Parse(l.Split(',')[0].Trim()),
myName= l.Split(',')[1].Trim()
}
).OrderBy(i => i.Id);
any help is appreciated!

The Enumerable.Select method is an extension method for an IEnumerable<T> type. It takes a Func<TSource, TResult> that allows you to take in your IEnumerable<T> items and project them to something else, such as a property of the type, or a new type. It makes heavy use of generic type inference from the compiler to do this without <> everywhere.
In your example, the IEnumerable<T> is the string[] of lines from the file. The Select func creates an anonymous type (also making use of generic type inference) and assigns some properties based on splitting each line l, which is a string from your enumerable.
OrderBy is another IEnumerable<T> extension method and proceeds to return an IEnumerable<T> in the order based on the expression you provide.
T at this point is the anonymous type from the Select with two properties (myIdentiafication and myName), so the OrderBy(i => i.Id) bit won't compile. It can be fixed:
.OrderBy(i => i.myIdentiafication);

This is a LINQ query. Enumerable.Select projects each line from file into anonymous object with properties myIdentiafication and myName. Then you sort sequence of anonymous objects with Enumerable.OrderBy. But you should select property which exists in anonymous object. E.g. myIdentiafication because there is no id property:
var lines = File.ReadLines(#"c:\temp\mycsvfil3.csv") // get sequence of lines
.Select(l => new {
myIdentiafication = int.Parse(l.Split(',')[0].Trim()),
myName= l.Split(',')[1].Trim()
}).OrderBy(i => i.myIdentiafication);
NOTE: To avoid parsing each line twice, you can use query syntax with introducing new range variables:
var lines = from l in File.ReadLines(#"c:\temp\mycsvfil3.csv")
let pair = l.Split(',')
let id = Int32.Parse(pair[0].Trim())
orderby id
select new {
Id = id,
Name = pair[1].Trim()
};

From each string returned by ReadLines create an anonymous object with two properties (myIdentiaficiation and myName). Within the Select the context variable l represents a single line from the set returned by ReadLines.

How to access the member of an IGrouping?

The two lines below return an IGrouping<string, DataRow>:
var mbVals = GetMBValues_MBvsPU(mbRptDataPkg);
var puVals = GetPUValues_MBvsPU(puRptDataPkg);
I'd though that you could access the grouping's data like this mbVals[StringKey] but that doesn't look possible. I can do a foreach over the DataRows but it just seems to me one should be able to easily access through a linq expression somehow.
What I'd like to do is compare fields in the datarows from one with fields from datarows of the other through the keys.
Any thoughts?
Thanks!

IGrouping<> is IEnumerable<> so you use the ElementAt() extension method.
But for the situation you describe, you may be better off using Zip() to unify the two groups (if the items are in the same order and match exactly) or using Join() if they don't.

An instance that implements IGrouping<T, U> has a (one) key of type T. Since you want to compare based on keys (plural), an IGrouping<string, DataRow> isn't what you need.
You need an IEnumerable<IGrouping<string, DataRow>> or an ILookup<string, DataRow>. Something that has many keys.
ILookup<string, DataRow> source1 = GetSource1();
ILookup<string, DataRow> source2 = GetSource2();
var BothKeyed =
(
from key in source1.Select(g => g.Key).Union(source2.Select(g => g.Key))
select new
{
Key = key,
In1 = source1[key],//note, In1 may be empty.
In2 = source2[key] //note, In2 may be empty.
}
).ToLookup(x => x.Key);

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Making a list distinct in C# - c#

Check out the DistinctBy syntax in the morelinq project. A a = new A(keyValuePairs); a = a.DistinctBy(k => new { k.Key, k.Value }).ToList();

var result = keyValuePairs.GroupBy(x => x.Key) .SelectMany(g => g.Key == "Country" ? g.Distinct() : g);

You can use the groupby statement. From here you can do all kind off cool stuf listOfA.GroupBy(i=>i.Value) You can groupby the value and then sum all the keys or something other usefull

Related

Flatten a Dictionary<int, List<object>>

List<T> Concatenation dynamically

Dynamically create anonymous object from list values c#

Need help understanding .Select method C#

How to access the member of an IGrouping?

Categories

Resources