Check IEnumerable<T> for items having duplicate properties - c#

How to check if an IEnumerable has two or more items with the same property value ?
For example a class
public class Item
{
public int Prop1 {get;set;}
public string Prop2 {get;set;}
}
and then a collection of type IEnumerable<Item>
I need to return false if there are items with duplicate values in Prop1.

You want to check only for Prop1 right ?
What about:
IEnumerable<Item> items = ...
var noDistinct = items.GroupBy(x => x.Prop1).All(x => x.Count() == 1);
// it returns true if all items have different Prop1, false otherwise

I think this method will work.
public static bool ContainsDuplicates<T1>(this IEnumerable<T1> source, Func<T1, T2> selector)
{
var d = new HashSet<T2>();
foreach(var t in source)
{
if(!d.Add(selector(t)))
{
return true;
}
}
return false;
}

A short, one-enumeration only solution would be:
public static bool ContainsDuplicates<T>(this IEnumerable<T> list)
=> !list.All(new HashSet<T>().Add);
which could be read as: A list has no duplicates when All items can be Add-ed to a set.
This is conceptually similar to Jake Pearsons solution; however, it leaves out the independant concept of projection; the OP's question would then be solved as:
items.Select(o => o.Prop1).ContainsDuplicates()

bool x = list.Distinct().SequenceEqual(list);
x is true if list has duplicates.

Have you tried Enumerable.Distinct(IEnumerable, IEqualityComparer)?

You can select the distinct values from the IEnumerable and then check the count against that of the full collection.
Example:
var distinctItemCount = myEnumerable.Select(m => m.Prop1).Distinct().Count();
if(distinctItemCount < myEnumerable.Count())
{
return false;
}

This could potentially be made for performant, but it's the only correct answer so far.
// Create an enumeration of the distinct values of Prop1
var propertyCollection = objectCollection.Select(o => o.Prop1).Distinct();
// If the property collection has the same number of entries as the object
// collection, then all properties are distinct. Otherwise there are some
// duplicates.
return propertyCollection.Count() == objectCollection.Count();

public static class EnumerableEx
{
public static IEnumerable<T> GetDuplicates<T>(this IEnumerable<T> source)
{
return source.GroupBy(t => t).Where(x => x.Count() > 1).Select(x => x.Key);
}
}
Personally, I like the neatness of extension methods.
If your objects don't require a selector for determining equality, then this works nicely.

We can remove duplicate entries by using .Distinct() in ArrayList.
Example:
I have a createdby column in testtable with 5 duplicate entries. I have to get only one row
ID Createdby
=== ========
1 Reddy
2 Reddy
3 Reddy
4 Reddy
Considering the above table, I need to select only one "Reddy"
DataTable table=new DataTable("MyTable");//Actually I am getting this table data from database
DataColumn col=new DataColumn("Createdby");
var childrows = table.AsEnumerable().Select( row => row.Field<object>(col)).Distinct().ToArray();

Related

Linq: Select KeyValuePair from dictionary where value is a list of objects

I have a field that looks like:
public Dictionary<ClassA, List<ClassB>> MyDict;
Assume that:
public class ClassA
{
public string Name;
public int Id;
}
public class ClassB
{
public string Tag;
public string Text;
}
I'm trying to define a query that is of IEnumerable<KeyValuePair<ClassA,IEnumerable<ClassB>> type where I define a condition on the value of ClassB.Tag. I tried things like:
IEnumerable<KeyValuePair<ClassA,IEnumerable<ClassB>> q =
MyDict.Where(pair => pair.Value.Any(b => b.Tag == "a tag"));
But obviously the above is not what I need because it returns the whole List<ClassB> if any item matches that condition, while what I want is to return an IEnumrable or a List of items that match the condition.
dotNetFiddle demo
You need to construct the IEnumerable from a call to ToDictionary, where you use a projection to only take the matching BClass from the list and only take the result from that set where values in the BClass list were actually matched.
IEnumerable<KeyValuePair<ClassA,List<ClassB>>> q = MyDict.ToDictionary(
k => k.Key,
k => k.Value.Where(b => b.Tag == "10").ToList()
).Where(kv => kv.Value.Any());

Best way to find out distinct item in the big list

I have a following collection, it has more than 500000 items in it.
List<Item> MyCollection = new List<Item>();
and type:
class Item
{
public string Name { get; set; }
public string Description { get; set; }
}
I want to return a list of items having distinct Name. i.e. to find out distinct item based on name.
What are the possible ways & which would be best in terms of time & memory. Although both are important however less time has more priority over memory.
I would opt for Linq, unless or until the performance turns out to be insufficient:
var considered = from i in MyCollection
group i by i.Name into g
select new { Name = g.Key, Cnt = g.Count(), Instance = g.First() };
var result = from c in considered where c.Cnt == 1 select c.Instance;
(Assuming I've interpreted your question correctly as "return those items whose Name only appears once in the list")
i am having java version of the code
implement the comparator then define the method as below in Item class
public int compare(MyObject o1, MyObject o2)
{
// return 0 if objects are equal in terms of your data members such as name or any
}
Then use the below code in the class in which MyCollection is defined
HashSet<Item> set1 = new HashSet<Item>();
set1.addAll(MyCollection);
MyCollection.clear();
MyCollection.addAll(set1);
This will give you the sorted set
You can sort your list an then delete all repeated items, But seems that storing all data in a Dictionary<string, string> would be better for this task. Or maybe even put all the list in a HashSet.
MoreLinq has a DistinctBy extension that is great for this sort of thing, its open source and just a few lines of code so easy to add to your code.
var results = MyCollection.DistinctBy(p => p.Name);
I can see you found your answer, but you can also do it fairly simply using Distinct;
internal class NameComparer : IEqualityComparer<Item> {
public bool Equals(Item x, Item y) { return x.Name == y.Name; }
public int GetHashCode(Item obj) { return obj.Name.GetHashCode(); }
}
var distinctItems = MyCollection.Distinct(new NameComparer());
First solution:
public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> sequence, Func<T, TKey> keySelector)
{
var alreadyUsed = new HashSet<TKey>();
foreach (var item in sequence)
{
var key = keySelector(item);
if (alreadyUsed.Add(key))
{
yield return item;
}
}
}
Second is to use .Distinct() and override Equals in your item to match name

How to select the same list but with an additional variable set?

I have a List<MyObject>
MyObject is as follows.
public class MyObject
{
public bool available;
public bool online;
}
Now when I retrieve this List<MyObject> from another function, only the available field is set. Now I want to set the online field of each MyObject.
What I currently do is
List<MyObject> objectList = getMyObjectList();
objectList.ForEach(x => x.online = IsOnline(x));
Now that the online property is set, I want to filter again using Where to select MyObject that is both available and online.
objectList.Where(x => x.available && x.online);
While I understand that the code above is working and readable, I am curious to know whether there is a LINQ way of selecting the same object but with a variable initialized so I can combine all the three lines to one line. Unfortunately ForEach does not return the list back.
Something like
getMyObjectList().BetterForEach(x => x.online = IsOnline(x)).Where(x => x.available && x.online);
BetterForEach will return x with the previous values set and with the online field set as well.
Is there any way function/way to achieve this using LINQ?
UPDATE
I've removed other fields of MyObject. MyObject does not only contain these fields but many more. I'd rather not create new instances of MyObject.
The simplest solution is probably to make an extension method that is like ForEach but returns the list for chaining:
public static List<T> ForEachThen<T>(this List<T> source, Action<T> action)
{
source.ForEach(action);
return source;
}
Linq is for querying data, not for updating it. So any option you have will not be too pretty, but there are still some options.
You could do this:
var result =
objectList.Select(x =>
{
x.online = IsOnline(x);
return x;
});
However, that is pretty bad practice. This would be better:
var result =
objectList.Select(x => new MyObject
{
available = x.available,
online = IsOnline(x)
});
But this creates a collection of new objects, not related to your original set.
This kind of code generally indicates there might be something wrong with your basic design. Personally, I'd go with something like this (if you can set up a static method to do the work of IsOnline):
public class MyObject
{
public bool Available;
public bool Online { get { return MyObjectHelper.IsOnline(this); } }
}
...
var result = objectList.Where(x => x.Available && x.Online);
Or if you can't set up a static method, maybe this field doesn't need to be in MyObject class at all?
public class MyObject
{
public bool Available;
}
...
var result = objectList.Where(x => x.Available && IsOnline(x));
var selected = getMyObjectList()
.Select(x => new MyObject{available=x.available, online = IsOnline(x))
.Where(x => x.available && x.online);
assuming you need to access online in the resulting list..
You could also add a method to MyObject like:
public MyObject SetOnline(bool isOnline) {
this.online = isOnline;
return this;
}
and then do:
var selected = getMyObjectList()
.Select(x => x.SetOnline( IsOnline(x) ))
.Where(x => x.available && x.online);

Using .Select and .Where in a single LINQ statement

I need to gather Distinct Id's from a particular table using LINQ. The catch is I also need a WHERE statement that should filter the results based only from the requirements I've set. Relatively new to having to use LINQ so much, but I'm using the following code more or less:
private void WriteStuff(SqlHelper db, EmployeeHelper emp)
{
String checkFieldChange;
AnIList tableClass = new AnIList(db, (int)emp.PersonId);
var linq = tableClass.Items
.Where(
x => x.UserId == emp.UserId
&& x.Date > DateBeforeChanges
&& x.Date < DateAfterEffective
&& (
(x.Field == Inserted)
|| (x.Field == Deleted)))
)
).OrderByDescending(x => x.Id);
if (linq != null)
{
foreach (TableClassChanges item in linq)
{
AnotherIList payTxn = new AnotherIList(db, item.Id);
checkFieldChange = GetChangeType(item.FieldName);
// Other codes that will retrieve data from each item
// and write it into a text file
}
}
}
I tried to add .Distinct for var linq but it's still returning duplicate items (meaning having the same Id's). I've read through a lot of sites and have tried adding a .Select into the query but the .Where clause breaks instead. There are other articles where the query is somehow different with the way it retrieves the values and place it in a var. I also tried to use .GroupBy but I get an "At least one object must implement IComparable" when using Id as a key.
The query actually works and I'm able to output the data from the columns with the specifications I require, but I just can't seem to make .Distinct work (which is the only thing really missing). I tried to create two vars with one triggering a distinct call then have a nested foreach to ensure the values are just unique, but will thousands of records to gather the performance impact is just too much.
I'm unsure as well if I'd have to override or use IEnumerable for my requirement, and thought I'd ask the question around just in case there's an easier way, or if it's possible to have both .Select and .Where working in just one statement?
Did you add the Select() after the Where() or before?
You should add it after, because of the concurrency logic:
1 Take the entire table
2 Filter it accordingly
3 Select only the ID's
4 Make them distinct.
If you do a Select first, the Where clause can only contain the ID attribute because all other attributes have already been edited out.
Update: For clarity, this order of operators should work:
db.Items.Where(x=> x.userid == user_ID).Select(x=>x.Id).Distinct();
Probably want to add a .toList() at the end but that's optional :)
In order for Enumerable.Distinct to work for your type, you can implement IEquatable<T> and provide suitable definitions for Equals and GetHashCode, otherwise it will use the default implementation: comparing for reference equality (assuming that you are using a reference type).
From the manual:
The Distinct(IEnumerable) method returns an unordered sequence that contains no duplicate values. It uses the default equality comparer, Default, to compare values.
The default equality comparer, Default, is used to compare values of the types that implement the IEquatable generic interface. To compare a custom data type, you need to implement this interface and provide your own GetHashCode and Equals methods for the type.
In your case it looks like you might just need to compare the IDs, but you may also want to compare other fields too depending on what it means for you that two objects are "the same".
You can also consider using DistinctBy from morelinq.
Note that this is LINQ to Objects only, but I assume that's what you are using.
Yet another option is to combine GroupBy and First:
var query = // your query here...
.GroupBy(x => x.Id)
.Select(g => g.First());
This would also work in LINQ to SQL, for example.
Since you are trying to compare two different objects you will need to first implement the IEqualityComparer interface. Here is an example code on a simple console application that uses distinct and a simple implementation of the IEqualityComparer:
class Program
{
static void Main(string[] args)
{
List<Test> testData = new List<Test>()
{
new Test(1,"Test"),
new Test(2, "Test"),
new Test(2, "Test")
};
var result = testData.Where(x => x.Id > 1).Distinct(new MyComparer());
}
}
public class MyComparer : IEqualityComparer<Test>
{
public bool Equals(Test x, Test y)
{
return x.Id == y.Id;
}
public int GetHashCode(Test obj)
{
return string.Format("{0}{1}", obj.Id, obj.Name).GetHashCode();
}
}
public class Test
{
public Test(int id, string name)
{
this.id = id;
this.name = name;
}
private int id;
public int Id
{
get { return id; }
set { id = value; }
}
private string name;
public string Name
{
get { return name; }
set { name = value; }
}
}
I hope that helps.
Do you passed a IEqualityComparer<T> to .Distinct()?
Something like this:
internal abstract class BaseComparer<T> : IEqualityComparer<T> {
public bool Equals(T x, T y) {
return GetHashCode(x) == GetHashCode(y);
}
public abstract int GetHashCode(T obj);
}
internal class DetailComparer : BaseComparer<StyleFeatureItem> {
public override int GetHashCode(MyClass obj) {
return obj.ID.GetHashCode();
}
}
Usage:
list.Distinct(new DetailComparer())
You can easily query with LINQ like this
considering this JSON
{
"items": [
{
"id": "10",
"name": "one"
},
{
"id": "12",
"name": "two"
}
]
}
putting it in a variable called json like this,
JObject json = JObject.Parse("{'items':[{'id':'10','name':'one'},{'id':'12','name':'two'}]}");
you can select all ids from the items where name is "one" using the following LINQ query
var Ids =
from item in json["items"]
where (string)item["name"] == "one"
select item["id"];
Then, you will have the result in an IEnumerable list

Find Index from List

I have a Dictionary that contains thread Information Dictionary<String,Thread>
"2FF"
"2IE"
"7CH"
etc
what i know is integers 2,7 etc what i want to know that in Dictionary how many strings contain the given integer if it is there then get that string
Eg
String GetString(int integer)
{
//if Dictionary contains given intgr return whole string in which that integer is present
}
}
With LINQ syntax:
var matchingThreads = from pair in dictionary
where pair.Key.StartsWith(number.ToString())
select pair.Value;
With traditional syntax:
var matchingThreads = dictionary
.Where(pair => pair.Key.StartsWith(number.ToString()))
.Select(pair => pair.Value);
If you only need to count them and you don't care about the Thread objects, you can use:
int count = dictionary.Keys.Count(key => key.StartsWith(number.ToString()))
Note that you need a using System.Linq directive.
Maybe a List<CustomClass> would be a better choice here where CustomClass would look like:
public sealed class CustomClass
{
public Thread Thread { get; set; }
public string String { get; set; }
}
(Better property names are alway good, of course :-) )
A dictionary is not sutitable if you do not know the exact keys or only parts of them.
You could then use LINQ to find out what you want, e.g.:
int count = list.Where(c => c.String.StartsWith(integer.ToString())).Count();
//or
IEnumerable<string> strings = list.Where(c => c.String.StartsWith(integer.ToString())).Select(c => c.String);
public IEnumerable<string> GetMatchingKeys(int value)
{
var valueText = value.ToString();
return _dictionary.Keys.Where(key => key.Contains(valueText));
}

Categories

Resources