Tell LINQ Distinct which item to return - c#

I understand how to do a Distinct() on a IEnumerable and that I have to create an IEqualityComparer for more advanced stuff however is there a way in which you can tell which duplicated item to return?
For example say you have a List<T>
List<MyClass> test = new List<MyClass>();
test.Add(new MyClass {ID = 1, InnerID = 4});
test.Add(new MyClass {ID = 2, InnerID = 4});
test.Add(new MyClass {ID = 3, InnerID = 14});
test.Add(new MyClass {ID = 4, InnerID = 14});
You then do:
var distinctItems = test.Distinct(new DistinctItemComparer());
class DistinctItemComparer : IEqualityComparer<MyClass> {
public bool Equals(MyClass x, MyClass y) {
return x.InnerID == y.InnerID;;
}
public int GetHashCode(MyClassobj) {
return obj.InnerID.GetHasCode();
}
}
This code will return the classes with ID 1 and 3. Is there a way to return the ID matches 2 & 4.

I don't believe it's actually guaranteed, but I'd be very surprised to see the behaviour of Distinct change from returning items in the order they occur in the source sequence.
So, if you want particular items, you should order your source sequence that way. For example:
items.OrderByDescending(x => x.Id)
.Distinct(new DistinctItemComparer());
Note that one alternative to using Distinct with a custom comparer is to use DistinctBy from MoreLINQ:
items.OrderByDescending(x => x.Id)
.DistinctBy(x => x.InnerId);
Although you can't guarantee that the normal LINQ to Objects ordering from Distinct won't change, I'd be happy to add a guarantee to MoreLINQ :) (It's the only ordering that is sensible anyway, to be honest.)
Yet another alternative would be to use GroupBy instead - then for each inner ID you can get all the matching items, and go from there.

You don't want distinct then - you want to group your items and select the "maximum" element for them, based on ID:
var distinctItems = test.Distinct(new DistinctItemComparer());
var otherItems = test.GroupBy(a => a.InnerID, (innerID, values) => values.OrderBy(b => b.ID).Last());
var l1 = distinctItems.ToList();
var l2 = otherItems.ToList();
l1 = your current list
l2 = your desired list

This doesn't sound like a job for Distinct, this sounds like a job for Where. You want to filter the sequence in your case:
var ids = new[] { 2, 4 };
var newSeq = test.Where(m => ids.Contains(m.ID));

If you want to select one particular of the group of elements that are considered equal using the comparison you use, then you can use group by:
var q = from t in tests
group t by t.InnerID into g
select g.First(...);
In the select clause, you'll get a collection of elements that are equal and you can select the one specific element you need (e.g. using First(...)). You actually don't need to add Distinct to the end, because you're already selecting only a single element for each of the groups.

No, there's no way.
Distinct() is used to find distinct elements. If you're worried about which element to return...then obviously they are not truly identical (and therefore not distinct) and you have a flaw in your design.

Related

Update a property field in a List

I have a List<Map> and I wanted to update the Map.Target property based from a matching value from another List<Map>.
Basically, the logic is:
If mapsList1.Name is equal to mapsList2.Name
Then mapsList1.Target = mapsList2.Name
The structure of the Map class looks like this:
public class Map {
public Guid Id { get; set; }
public string Name { get; set; }
public string Target { get; set; }
}
I tried the following but obviously it's not working:
List<Map> mapsList1 = new List<Map>();
List<Map> mapsList2 = new List<Map>();
// populate the 2 lists here
mapsList1.Where(m1 => mapsList2.Where(m2 => m1.Name == m2.Name) ) // don't know what to do next
The count of items in list 1 will be always greater than or equal to the count of items in list 2. No duplicates in both lists.
Assuming there are a small number of items in the lists and only one item in list 1 that matches:
list2.ForEach(l2m => list1.First(l1m => l1m.Name == l2m.Name).Target = l2m.Target);
If there are more than one item in List1 that must be updated, enumerate the entire list1 doing a First on list2.
list1.ForEach(l1m => l1m.Target = list2.FirstOrDefault(l2m => l1.Name == l2m.Name)?.Target ?? l1m.Target);
If there are a large number of items in list2, turn it into a dictionary
var d = list2.ToDictionary(m => m.Name);
list1.ForEach(m => m.Target = d.ContainsKey(m.Name) ? d[m.Name].Target : m.Target);
(Presumably list2 doesn't contain any repeated names)
If list1's names are unique and everything in list2 is in list1, you could even turn list1 into a dictionary and enumerate list2:
var d=list1.ToDictionary(m => m.Name);
list2.ForEach(m => d[m.Name].Target = m.Target);
If List 2 has entries that are not in list1 or list1 has duplicate names, you could use a Lookup instead, you'd just have to do something to avoid a "collection was modified; enumeration may not execute" you'd get if you were trying to modify the list it returns in response to a name
mapsList1.Where(m1 => mapsList2.Where(m2 => m1.Name == m2.Name) ) // don't know what to do next
LINQ Where doesn't really work like that / that's not a statement in itself. The m1 is the entry from list1, and the inner Where would produce an enumerable of list 2 items, but it doesn't result in the Boolean the outer Where is expecting, nor can you do anything to either of the sequences because LINQ operations are not supposed to have side effects. The only thing you can do with a Where is capture or use the sequence it returns in some other operation (like enumerating it), so Where isn't really something you'd use for this operation unless you use it to find all the objects you need to alter. It's probably worth pointing out that ForEach is a list thing, not a LINQ thing, and is basically just another way of writing foreach(var item in someList)
If collections are big enough better approach would be to create a dictionary to lookup the targets:
List<Map> mapsList1 = new List<Map>();
List<Map> mapsList2 = new List<Map>();
var dict = mapsList2
.GroupBy(map => map.Name)
.ToDictionary(maps => maps.Key, maps => maps.First().Target);
foreach (var map in mapsList1)
{
if (dict.TryGetValue(map.Name, out var target))
{
map.Target = target;
}
}
Note, that this will discard any possible name duplicates from mapsList2.

Sort in-memory list by another in-memory list

Is possible to sort an in-memory list by another list (the second list would be a reference data-source or something like this) ?
public class DataItem
{
public string Name { get; set; }
public string Path { get; set; }
}
// a list of Data Items, randomly sorted
List<DataItem> dataItems = GetDataItems();
// the sort order data source with the paths in the correct order
IEnumerable<string> sortOrder = new List<string> {
"A",
"A.A1",
"A.A2",
"A.B1"
};
// is there a way to tell linq to sort the in-memory list of objects
// by the sortOrder "data source"
dataItems = dataItems.OrderBy(p => p.Path == sortOrder).ToList();
First, lets assign an index to each item in sortOrder:
var sortOrderWithIndices = sortOrder.Select((x, i) => new { path = x, index = i });
Next, we join the two lists and sort:
var dataItemsOrdered =
from d in dataItems
join x in sortOrderWithIndices on d.Path equals x.path //pull index by path
orderby x.index //order by index
select d;
This is how you'd do it in SQL as well.
Here is an alternative (and I argue more efficient) approach to the one accepted as answer.
List<DataItem> dataItems = GetDataItems();
IDictionary<string, int> sortOrder = new Dictionary<string, int>()
{
{"A", int.MaxValue},
{"A.A1", int.MaxValue-1},
{"A.A2", int.MaxValue -2},
{"A.B1", int.MaxValue-3},
};
dataItems.Sort((di1, di2) => sortOrder[di1.Path].CompareTo(sortOrder[di2.Path]));
Let's say Sort() and OrderBy() both take O(n*logn), where n is number of items in dataItems. The solution given here takes O(n*logn) to perform the sort. We assume the step required to create the dictionary sortOrder has a cost not significantly different from creating the IEnumerable in the original post.
Doing a join and then sorting the collection, however adds an additional cost O(nm) where m is number of elements in sortOrder. Thus the total time complexity for that solution comes to O(nm + nlogn).
In theory, the approach using join may boil down to O(n * (m + logn)) ~= O(n*logn) any way. But in practice, join is costing extra cycles. This is in addition to possible extra space complexity incurred in the linq approach where auxiliary collections might have been created in order to process the linq query.
If your list of paths is large, you would be better off performing your lookups against a dictionary:
var sortValues = sortOrder.Select((p, i) => new { Path = p, Value = i })
.ToDictionary(x => x.Path, x => x.Value);
dataItems = dataItems.OrderBy(di => sortValues[di.Path]).ToList();
custom ordering is done by using a custom comparer (an implementation of the IComparer interface) that is passed as the second argument to the OrderBy method.

Check if list<t> contains any of another list

I have a list of parameters like this:
public class parameter
{
public string name {get; set;}
public string paramtype {get; set;}
public string source {get; set;}
}
IEnumerable<Parameter> parameters;
And a array of strings i want to check it against.
string[] myStrings = new string[] { "one", "two"};
I want to iterate over the parameter list and check if the source property is equal to any of the myStrings array. I can do this with nested foreach's but i would like to learn how to do it in a nicer way as i have been playing around with linq and like the extension methods on enumerable like where etc so nested foreachs just feel wrong. Is there a more elegant preferred linq/lambda/delegete way to do this.
Thanks
You could use a nested Any() for this check which is available on any Enumerable:
bool hasMatch = myStrings.Any(x => parameters.Any(y => y.source == x));
Faster performing on larger collections would be to project parameters to source and then use Intersect which internally uses a HashSet<T> so instead of O(n^2) for the first approach (the equivalent of two nested loops) you can do the check in O(n) :
bool hasMatch = parameters.Select(x => x.source)
.Intersect(myStrings)
.Any();
Also as a side comment you should capitalize your class names and property names to conform with the C# style guidelines.
Here is a sample to find if there are match elements in another list
List<int> nums1 = new List<int> { 2, 4, 6, 8, 10 };
List<int> nums2 = new List<int> { 1, 3, 6, 9, 12};
if (nums1.Any(x => nums2.Any(y => y == x)))
{
Console.WriteLine("There are equal elements");
}
else
{
Console.WriteLine("No Match Found!");
}
If both the list are too big and when we use lamda expression then it will take a long time to fetch . Better to use linq in this case to fetch parameters list:
var items = (from x in parameters
join y in myStrings on x.Source equals y
select x)
.ToList();
list1.Select(l1 => l1.Id).Intersect(list2.Select(l2 => l2.Id)).ToList();
var list1 = await _service1.GetAll();
var list2 = await _service2.GetAll();
// Create a list of Ids from list1
var list1_Ids = list1.Select(l => l.Id).ToList();
// filter list2 according to list1 Ids
var list2 = list2.Where(l => list1_Ids.Contains(l.Id)).ToList();

Remove items from one list in another

I'm trying to figure out how to traverse a generic list of items that I want to remove from another list of items.
So let's say I have this as a hypothetical example
List<car> list1 = GetTheList();
List<car> list2 = GetSomeOtherList();
I want to traverse list1 with a foreach and remove each item in List1 which is also contained in List2.
I'm not quite sure how to go about that as foreach is not index based.
You can use Except:
List<car> list1 = GetTheList();
List<car> list2 = GetSomeOtherList();
List<car> result = list2.Except(list1).ToList();
You probably don't even need those temporary variables:
List<car> result = GetSomeOtherList().Except(GetTheList()).ToList();
Note that Except does not modify either list - it creates a new list with the result.
You don't need an index, as the List<T> class allows you to remove items by value rather than index by using the Remove function.
foreach(car item in list1) list2.Remove(item);
In my case I had two different lists, with a common identifier, kind of like a foreign key.
The second solution cited by "nzrytmn":
var result = list1.Where(p => !list2.Any(x => x.ID == p.ID && x.property1 == p.property1)).ToList();
Was the one that best fit in my situation.
I needed to load a DropDownList without the records that had already been registered.
Thank you !!!
This is my code:
t1 = new T1();
t2 = new T2();
List<T1> list1 = t1.getList();
List<T2> list2 = t2.getList();
ddlT3.DataSource= list2.Where(s => !list1.Any(p => p.Id == s.ID)).ToList();
ddlT3.DataTextField = "AnyThing";
ddlT3.DataValueField = "IdAnyThing";
ddlT3.DataBind();
I would recommend using the LINQ extension methods. You can easily do it with one line of code like so:
list2 = list2.Except(list1).ToList();
This is assuming of course the objects in list1 that you are removing from list2 are the same instance.
list1.RemoveAll(l => list2.Contains(l));
You could use LINQ, but I would go with RemoveAll method. I think that is the one that better expresses your intent.
var integers = new List<int> { 1, 2, 3, 4, 5 };
var remove = new List<int> { 1, 3, 5 };
integers.RemoveAll(i => remove.Contains(i));
Solution 1 : You can do like this :
List<car> result = GetSomeOtherList().Except(GetTheList()).ToList();
But in some cases may this solution not work. if it is not work you can use my second solution .
Solution 2 :
List<car> list1 = GetTheList();
List<car> list2 = GetSomeOtherList();
we pretend that list1 is your main list and list2 is your secondry list and you want to get items of list1 without items of list2.
var result = list1.Where(p => !list2.Any(x => x.ID == p.ID && x.property1 == p.property1)).ToList();
As Except does not modify the list, you can use ForEach on List<T>:
list2.ForEach(item => list1.Remove(item));
It may not be the most efficient way, but it is simple, therefore readable, and it updates the original list (which is my requirement).
I think it would be quick to convert list A to a dictionary and then foreach the second list and call DictA.Remove(item) otherwise I think most solutions will cause many iterations through list A either directly or under the covers.
If the lists are small, it probably won't matter.
In case you have two different list with different DataModals
List<FeedbackQuestionsModel> feedbackQuestionsList = new();
List<EmployeesFeedbacksQuestionsModel> employeeQuestionsList = new();
var resultList = feedbackQuestionsList.Where(p => !employeeQuestionsList.Any(x => x.Question == p.Question)).ToList();
feedbackQuestionsList = resultList.ToList();
Here ya go..
List<string> list = new List<string>() { "1", "2", "3" };
List<string> remove = new List<string>() { "2" };
list.ForEach(s =>
{
if (remove.Contains(s))
{
list.Remove(s);
}
});

LINQ Keyword search with orderby relevance based on count (LINQ to SQL)

This is what i have now as a very basic search:
var Results = from p in dx.Listings select p;
if (CategoryId > 0) Results = Results.Where(p => p.CategoryId == CategoryId);
if (SuburbId > 0) Results = Results.Where(p => p.SuburbId == SuburbId);
var OrderedResults = Results.OrderByDescending(p => p.ListingType);
OrderedResults = OrderedResults.ThenByDescending(p => p.Created);
I understand that i can add in a .Contains() or similar and put in keywords from a keyword box (split into individual items) and that should get the list of results.
However i need to order the results by basic relevance. Meaning that if record A contains 2 of the keywords (in the 'Body' nvarchar(MAX) field) it should be higher than record B that only matches against 1 of the keywords. I don't need a full count of every hit... however if thats eaiser to manage that would be fine.
So is there any way to get the hit count directly in as part of the orderby nicely? I can manage it by getting the results and parsing however i really don't want to do that as parsing possibly thousands could chug the IIS machine, while the SQL Server is a decently powerful cluster :)
If anyone has any ideas or tips it would be a big help.
If I understand you correctly you want to call OrderyByDescending( p => p.Body ) but it should be ordered by how many times a certain word appreas in p.Body ?
Then you should be able to create a method that counts the occurrences and returns the count number then you can simply do OrderyByDescending( p => CountOccurences(p.Body) )
You can alternatively create a BodyComparer class that implements IComparer and then pass it to OrderByDescending
EDIT:
take a look a this link Enable Full Text Searching
Here is a simple example, if I understand what you're looking for correctly:
var storedData = new[]{
new int[] {1, 2, 3, 4},
new int[] {1, 2, 3, 4, 5}
};
var itemsFromTextBox = new[] { 3, 4, 5 };
var query = storedData.Where(a => a.ContainsAny(itemsFromTextBox))
.OrderByDescending(a => itemsFromTextBox.Sum(i => a.Contains(i)? 1:0));
With the following ContainsAny extension:
public static bool ContainsAny<T>(this IEnumerable<T> e1, IEnumerable<T> e2)
{
foreach (var item in e2)
{
if (e1.Contains(item)) return true;
}
return false;
}

Categories

Resources