I'm trying to develop a LINQ query that will identify objects that have duplicate values. I only need the objects where a string in a multivalued attribute matches a string in the same attribute on another object AND the "name" values don't match.
I am trying to use the following code, but it does not work because it doesn't seem possible to use the "o" variable in a subquery.
myList.Where(o => myList.Any(a => a.name != o.name && a.multival.Any(p => o.multival.Contains(p))))
Why even use linq for this? it will be convoluted and difficult to read. I would solve this problem with a nested for loop:
var listOfDuplicates = new IEnumerable<YourObjectType>();
foreach (var a in myList)
{
foreach (var b in myListb)
{
if (a.multival == b.multival && a.name != b.name)
listOfDuplicates.Add(a);
}
}
In response to comments, this is how one would implement a method to exit similar to LINQs FirstOrDefault() and other methods that exit after X amount of matches:
Public IEnumerable<YourObjectType> FindDuplicates(IEnumerable<YourObjectType> myList, int maxDupes)
{
var listOfDuplicates = new IEnumerable<YourObjectType>();
foreach (var a in myList)
{
foreach (var b in myListb)
{
if (a.multival == b.multival && a.name != b.name)
listOfDuplicates.Add(a);
if (listOfDuplicates.length == maxDupes)
return listOfDuplicates;
}
}
return listOfDuplicates;
}
Your query should actually "work," but it's not going to be very efficient if your list size is particularly large. If you're having troubles compiling, check to be sure you do not have any typos. If you're having problems at runtime, add some null checks on your variables and properties. The rest of this answer is to guide how you might utilize Linq to make your query better.
Given the query you have attempted to write, I am going to infer the following closely approximates the relevant parts of your class structure, though I'm using different name for what you have as "multival."
class Foo
{
public string Name { get; set; }
public string[] Attributes { get; set; }
}
And then given an object list looking roughly like this
var mylist = new List<Foo>
{
new Foo { Name = "Alpha", Attributes = new[] { "A", "B", "C" } },
new Foo { Name = "Bravo", Attributes = new[] { "D", "E", "F" } },
new Foo { Name = "Charlie", Attributes = new[] { "G", "H", "A" } }
};
For finding objects that match any other object based on any match of an attribute, this is how I would approach it using Linq:
var part1 = from item in mylist
from value in item.Attributes
select new { item, value };
var query = (from pairA in part1
join pairB in part1 on pairA.value equals pairB.value
where pairA.item.Name != pairB.item.Name
select pairA.item)
.Distinct(); // ToList() to materialize, as necessary
If you were to run that through your editor of choice and explore the contents of part2, you would expect to see objects "Alpha" and "Charlie" based on the shared attribute of "A".
This approach should scale much better than a nested foreach should the size of your initial list be significant (for example, your list containing 10,000 elements instead of 3), which is precisely what your initial approach is.
Related
I have 2 customer lists and I'm trying to get a list of customers that DO NOT have a matching Name property between the 2 lists. I also need to include the customers from the second list that DO match, but that do not have the source property set to "migrated". Basically I would have a list of customer to add and update. I've tried a bunch of ways, but when I add the conditional for Source I end up with the wrong results. I'm doing it this way to have the ability to migrate in batches.
var legacyCustomers = new List<Customer>{
new Customer() { Name = "Customer 1" },
new Customer() { Name = "Customer 2" },
new Customer() { Name = "Customer 3" },
new Customer() { Name = "Customer 4" }
};
var currentCustomers = new List<Customer>{
new Customer() { Name = "Customer 1", Source = "migrated" },
new Customer() { Name = "Customer 2", Source = "migrated" },
new Customer() { Name = "Customer 3", Source = "" }
};
In this scenario I need "Customer 3" and "Customer 4" added to a new Customer list.
Here's a fiddle I've been using https://dotnetfiddle.net/Z0RoFe
Any help is very appreciated.
The code becomes a little bit simpler if we implement IEqualityComparer<Customer>. That means we're creating a class that uses custom logic to determine if two customers are equal.
public class CustomerNameEqualityComparer : IEqualityComparer<Customer>
{
public bool Equals(Customer x, Customer y)
{
return string.Equals(x?.Name, y?.Name, StringComparison.CurrentCultureIgnoreCase);
}
public int GetHashCode(Customer obj)
{
return obj.Name?.GetHashCode() ?? 0;
}
}
According to this class, two customers are equal if they have the same name. The convenience is that we can use this comparison without actually modifying the Customer class, since this might not always be the way we want to compare customers.
We can do it without this, but it results in a lot of complicated Where functions that compare the names. If you're going to compare items using specific comparison logic then it's easier to create the comparison once and re-use it.
If we did this (given that firstList and secondList are both List):
var customersFromFirstListNotInSecondList = firstList.Except(secondList);
It wouldn't work because it would use reference equality to compare the two lists instead of looking for matching names. But if we do this:
var customersFromFirstListNotInSecondList =
firstList.Except(secondList, new CustomerNameEqualityComparer());
It will compare the customers in the two lists just by matching their names.
That comparer class also makes the second step easier to implement:
var matchingButNotMigrated =
firstList.Intersect(secondList, new CustomerNameEqualityComparer())
.Where(customer => customer.Source != "migrated");
This returns the items that are on both lists (intersection), again comparing using the names. Once it has the items that are on both lists it excludes those that are migrated.
Simply add the extra condition to your Where clause (i.e. where the names don't match or the source does not equal "migrated"):
var migrateList = legacyCustomers
.Where(c => currentCustomers.All(c2 =>
!string.Equals(c2.Name, c.Name, StringComparison.CurrentCultureIgnoreCase) ||
!string.Equals(c2.Source, "migrated", StringComparison.CurrentCultureIgnoreCase)
))
.ToList();
You literally just need to add a logical OR and check for the "migrated" string.
var migrateList = legacyCustomers.Where(c => currentCustomers.All(c2 =>
!string.Equals(c2.Name, c.Name, StringComparison.CurrentCultureIgnoreCase)
|| c2.Source != "migrated")).ToList();
There's no need to do any additional check on the name since any non-matching names are already included, so the extra condition will only be adding names that do match but have a Source of "migrated".
Using the Except method produces a O(n) solution.
var comparer = new CustomerNameEqualityComparer();
var results = legacyCustomers
.Except(currentCustomers.Where(customer => customer.Source == "migrated"), comparer);
Console.WriteLine($"Result: {String.Join(", ", results.Select(c => c.Name))}");
Output:
Customer 3, Customer 4
I am using the elegant CustomerNameEqualityComparer class created by #Scott Hannen. 😃
public class CustomerNameEqualityComparer : IEqualityComparer<Customer>
{
public bool Equals(Customer x, Customer y)
{
return string.Equals(x?.Name, y?.Name, StringComparison.OrdinalIgnoreCase);
}
public int GetHashCode(Customer obj)
{
return obj.Name?.GetHashCode() ?? 0;
}
}
I put the following code segment in .NET Fiddle but it printed out System.Linq.Enumerable+WhereArrayIterator1[System.String] I'd like to print out each content in result, in order to understand how Select works. Can someone please help to point out what the problem is? Many thanks!
string[] sequ1 = { "abcde", "fghi", "jkl", "mnop", "qrs" };
string[] sequ2 = { "abc", "defgh", "ijklm", "nop" };
var result =sequ1.Select( n1 => sequ2.Where(n2 => n1.Length < n2.Length) );
foreach( var y in result)
{
Console.WriteLine(y);
}
You are actually returning a collection of collections.
sequ1.Select( n1 => sequ2.Where(n2 => n1.Length < n2.Length) );
For each element in sequ1, this statement filters sequ2 to find all of the elements from the second sequence where the current value in the first sequence is shorter than it and then maps to a new collection containing each of those results.
To describe what Select is actually doing:
You start with a collection of things. In your case: sequ1 which has type IEnumerable<string>
You supply it with a function, this function takes an argument of the type of thing you supplied it with a collection of and has a return type of some other thing, in your case:
fun n1 => sequ2.Where(n2 => n1.Length < n2.Length)
Your function takes a string and returns an IEnumerable<string>
Finally, it returns a result containing a collection of each element in the original collection transformed to some new element by the function you supplied it with.
So you started with IEnumerable<string> and ended up with IEnumerable<IEnumerable<string>>.
That means you have a collection for each value that appears in sequ1.
As such, you would expect the result to be:
{{}, {"defgh", "ijklm"}, {"defgh", "ijklm"}, {"defgh", "ijklm"}, {"defgh", "ijklm"}}
You can inspect the results by adding another loop.
foreach(var y in result)
{
foreach(var z in result)
{
Console.WriteLine(z);
}
}
Change your Select to SelectMany:
var result = sequ1.SelectMany(n1 => sequ2.Where(n2 => n1.Length < n2.Length));
I may be wrong, but I think the OP wants to compare both arrays, and for each element, print the longest one.
If that's the case, I would do it as follows:
var result = sequ1.Take(sequ2.Length)
.Select((n1, i) =>
(n1.Length > sequ2.ElementAt(i).Length)
? n1
: sequ2.ElementAt(i));
Explanation:
Use Take to only go as long as the length of the second array, and avoid nullreference exceptions later on.
Use Select, with two arguments, the first is the string, the second is the index.
Use ElementAt to find the corresponding element in sequ2
I don't know about this example is about to help you to understand how select work. A more simple exmaple what i think is this.
public class Person {
public string Name { get; set; }
public string LastName { get; set; }
}
public class Test {
public Test() {
List<Person> persons = new List<Person>();
persons.Add(new Person() { Name = "Person1",LastName = "LastName1" });
persons.Add(new Person() { Name = "Person2",LastName = "LastName2" });
var getNamesFromPersons = persons.Select(p => p.Name);
}
}
If you are beginning c#, you need to sideline the keyword "var" from your code.
Force yourself to write out what the variables really are:
If you forego the use of var, you would have seen why your code was Console.Writing what it did.
string[] sequ1 = { "abcde", "fghi", "jkl", "mnop", "qrs", };
string[] sequ2 = { "abc", "defgh", "ijklm", "nop", };
IEnumerable<IEnumerable<string>> result = sequ1.Select(n1 => sequ2.Where(n2 => n1.Length < n2.Length));
foreach (IEnumerable<string> y in result)
{
foreach (string z in y)
{
Console.WriteLine(z);
}
}
This is a cosmetics issue, it is syntactical sugar only, just because I want to do it. But basically I have a lot of code that reads like this ... I am using a dynamic for the sheer purpose of time saving and example.
var terms = new List<string> {"alpha", "beta", "gamma", "delta", "epsilon", "rho"};
var list = new List<dynamic> {
new {
Foo = new List<string> {
"alpha",
"gamma"
}
}
};
var result = list
.Where(
n => n.Foo.Contains(terms[0]) ||
n.Foo.Contains(terms[1]) ||
n.Foo.Contains(terms[2]) ||
n.Foo.Contains(terms[3]) ||
n.Foo.Contains(terms[4]) ||
n.Foo.Contains(terms[5]))
.ToList();
Obviously this is a ludicrous hyperbole for the sheer sake of example, more accurate code is ...
Baseline =
Math.Round(Mutations.Where(n => n.Format.Is("Numeric"))
.Where(n => n.Sources.Contains("baseline") || n.Sources.Contains("items"))
.Sum(n => n.Measurement), 3);
But the basic point is that I have plenty of places where I want to check to see if a List<T> (usually List<string>, but there may at times be other objects. string is my present focus though) contains any of the items from another List<T>.
I thought that using .Any() would work, but I actually haven't been able to get that to function as expected. So far, only excessive "||" is all that yields the correct results.
This is functionally fine, but it is annoying to write. I wanted to know if there is a simpler way to write this out - an extension method, or a LINQ method that perhaps I've not understood.
Update
I am really aware that this may be a duplicate, but I am having a hard time figuring out the wording of the question to a level of accuracy that I can find duplicates. Any help is much appreciated.
Solution
Thank you very much for all of the help. This is my completed solution. I am using a dynamic here just to save time on example classes, but several of your proposed solutions worked.
var terms = new List<string> {"alpha", "beta", "gamma", "delta", "epsilon", "rho"};
var list = new List<dynamic> {
new { // should match
Foo = new List<string> {
"alpha",
"gamma"
},
Index = 0
},
new { // should match
Foo = new List<string> {
"zeta",
"beta"
},
Index = 1
},
new { // should not match
Foo = new List<string> {
"omega",
"psi"
},
Index = 2
},
new { // should match
Foo = new List<string> {
"kappa",
"epsilon"
},
Index = 3
},
new { // should not match
Foo = new List<string> {
"sigma"
},
Index = 4
}
};
var results = list.Where(n => terms.Any(t => n.Foo.Contains(t))).ToList();
// expected output - [0][1][3]
results.ForEach(result => {
Console.WriteLine(result.Index);
});
I thought that using .Any() would work, but I actually haven't been able to get that to function as expected.
You should be able to make it work by applying Any() to terms, like this:
var result = list
.Where(n => terms.Any(t => n.Foo.Contains(t)))
.ToList();
You could try this one:
var result = list.Where(terms.Contains(n.Foo))
.ToList();
I assume n.Foo is a string rather than a collection, in which case:
var terms = new List<string> { "alpha", "beta", "gamma", "delta", "epsilon", "rho" };
var list = (new List<string> { "alphabet", "rhododendron" })
.Select(x => new { Foo = x });
var result = list.Where(x => terms.Any(y => x.Foo.Contains(y)));
You want to find out if Any element in one collection exists within another collection, so Intersect should work nicely here.
You can modify your second code snippet accordingly:
var sources = new List<string> { "baseline", "items" };
Baseline =
Math.Round(Mutations.Where(n => n.Format.Is("Numeric"))
.Where(n => sources.Intersect(n.Sources).Any())
.Sum(n => n.Measurement), 3);
Regarding the part of the docs you quoted for "Intersect":
Produces the set intersection of two sequences by using the default equality comparer to compare values.
Every object can be compared to an object of the same type, to determine whether they're equal. If it's a custom class you created, you can implement IEqualityComparer and then you get to decide what makes two instances of your class "equal".
In this case, however, we're just comparing strings, nothing special. "Foo" = "Foo", but "Foo" ≠"Bar"
So in the above code snippet, we intersect the two collections by comparing all the strings in the first collection to all the strings in the second collection. Whichever strings are "equal" in both collections end up in a resulting third collection.
Then we call "Any()" to determine if there are any elements in that third collection, which tells us there was at least one match between the two original collections.
If performance is an issue when using Any() you can use a regular expression instead. Obviously, you should probably measure to make sure that regular expressions performs faster:
var terms = new List<string> { "alpha", "beta", "gamma", "delta", "epsilon", "rho" };
var regex = new Regex(string.Join("|", terms));
var result = list
.Where(n => regex.Match(n.Foo).Success);
This assumes that joining the terms to a list creates a valid regular expression but with simple words that should not be a problem.
One advantage of using a regular expression is that you can require that the terms are surrounded by word boundaries. Also, the predicate in the Where clause may be easier to understand when compared to a solution using Contains inside Any.
I know there are a lot of examples of this on the web, but I can't seem to get this to work.
Let me try to set this up, I have a list of custom objects that I need to have limited on a range of values.
I have a sort variable that changes based on some action on the UI, and I need to process the object differently based on that.
Here is my object:
MyObject.ID - Just an identifier
MyObject.Cost - The cost of the object.
MyObject.Name - The name of the object.
Now I need to filter this based on a range in the cost, so I will have something similar to this, considering that I could be limiting by Either of my bottom two properties.
var product = from mo in myobject
where mo.Cost <= 10000
or
var product = from mo in myobject
where mo.Name equals strName
Now I have the dynamic linq in my project, but I'm not figuring out how to get it to actually work, as when I do some of the examples I am only getting:
Func<Tsourse>bool> predicate
as an option.
Update:
I am trying to find a solution that helps me Objectify my code, as right now it is a lot of copy and paste for my linq queries.
Update 2:
Is there an obvious performance difference between:
var product = from mo in myobject
... a few joins ...
where mo.Cost <= 10000
and
var product = (from mo in myobject
... a few joins ...)
.AsQueryable()
.Where("Cost > 1000")
Maybe not directly answering your question, but DynamicQuery is unnecessary here. You can write this query as:
public IEnumerable<MyObject> GetMyObjects(int? maxCost, string name)
{
var query = context.MyObjects;
if (maxCost != null)
{
query = query.Where(mo => mo.Cost <= (int)maxCost);
}
if (!string.IsNullOrEmpty(name))
{
query = query.Where(mo => mo.Name == name);
}
return query;
}
If the conditions are mutually exclusive then just change the second if into an else if.
I use this pattern all the time. What "Dynamic Query" really means is combining pure SQL with Linq; it doesn't really help you that much with generating conditions on the fly.
using System.Linq;
var products = mo.Where(x => x.Name == "xyz");
var products = mo.Where(x => x.Cost <= 1000);
var products = mo.Where(x => x.Name == "xyz" || x.Cost <= 1000);
Read this great post on DLINQ by ScottGu
Dynamic LINQ (Part 1: Using the LINQ Dynamic Query Library)
You would need something like
var product = myobject.Where("Cost <= 10000");
var product = myobject.Where("Name = #0", strName);
If you downloaded the samples you need to find the Dynamic.cs file in the sample. You need to copy this file into your project and then add
using System.Linq.Dynamic; to the class you are trying to use Dynamic Linq in.
EDIT: To answer your edit. Yes, there is of course a performance difference. If you know the variations of filters beforehand then I would suggest writing them out without using DLINQ.
You can create your own Extension Method like so.
public static class FilterExtensions
{
public static IEnumerable<T> AddFilter<T,T1>(this IEnumerable<T> list, Func<T,T1, bool> filter, T1 argument )
{
return list.Where(foo => filter(foo, argument) );
}
}
Then create your filter methods.
public bool FilterById(Foo obj, int id)
{
return obj.id == id;
}
public bool FilterByName(Foo obj, string name)
{
return obj.name == name;
}
Now you can use this on an IEnumerable<Foo> very easily.
List<Foo> foos = new List<Foo>();
foos.Add(new Foo() { id = 1, name = "test" });
foos.Add(new Foo() { id = 1, name = "test1" });
foos.Add(new Foo() { id = 2, name = "test2" });
//Example 1
//get all Foos's by Id == 1
var list1 = foos.AddFilter(FilterById, 1);
//Example 2
//get all Foo's by name == "test1"
var list2 = foos.AddFilter(FilterByName, "test1");
//Example 3
//get all Foo's by Id and Name
var list1 = foos.AddFilter(FilterById, 1).AddFilter(FilterByName, "test1");
I've got a list of People that are returned from an external app and I'm creating an exclusion list in my local app to give me the option of manually removing people from the list.
I have a composite key which I have created that is common to both and I want to find an efficient way of removing people from my List using my List
e.g
class Person
{
prop string compositeKey { get; set; }
}
class Exclusions
{
prop string compositeKey { get; set; }
}
List<Person> people = GetFromDB;
List<Exclusions> exclusions = GetFromOtherDB;
List<Person> filteredResults = People - exclustions using the composite key as a comparer
I thought LINQ was the ideal way of doing this but after trying joins, extension methods, using yields, etc. I'm still having trouble.
If this were SQL I would use a not in (?,?,?) query.
Have a look at the Except method, which you use like this:
var resultingList =
listOfOriginalItems.Except(listOfItemsToLeaveOut, equalityComparer)
You'll want to use the overload I've linked to, which lets you specify a custom IEqualityComparer. That way you can specify how items match based on your composite key. (If you've already overridden Equals, though, you shouldn't need the IEqualityComparer.)
Edit: Since it appears you're using two different types of classes, here's another way that might be simpler. Assuming a List<Person> called persons and a List<Exclusion> called exclusions:
var exclusionKeys =
exclusions.Select(x => x.compositeKey);
var resultingPersons =
persons.Where(x => !exclusionKeys.Contains(x.compositeKey));
In other words: Select from exclusions just the keys, then pick from persons all the Person objects that don't have any of those keys.
I would just use the FindAll method on the List class. i.e.:
List<Person> filteredResults =
people.FindAll(p => return !exclusions.Contains(p));
Not sure if the syntax will exactly match your objects, but I think you can see where I'm going with this.
Many thanks for this guys.
I mangaged to get this down to one line:
var results = from p in People
where !(from e in exclusions
select e.CompositeKey).Contains(p.CompositeKey)
select p;
Thanks again everyone.
var thisList = new List<string>{ "a", "b", "c" };
var otherList = new List<string> {"a", "b"};
var theOnesThatDontMatch = thisList
.Where(item=> otherList.All(otherItem=> item != otherItem))
.ToList();
var theOnesThatDoMatch = thisList
.Where(item=> otherList.Any(otherItem=> item == otherItem))
.ToList();
Console.WriteLine("don't match: {0}", string.Join(",", theOnesThatDontMatch));
Console.WriteLine("do match: {0}", string.Join(",", theOnesThatDoMatch));
//Output:
//don't match: c
//do match: a,b
Adapt the list types and lambdas accordingly, and you can filter out anything.
https://dotnetfiddle.net/6bMCvN
You can use the "Except" extension method (see http://msdn.microsoft.com/en-us/library/bb337804.aspx)
In your code
var difference = people.Except(exclusions);
I couldn't figure out how to do this in pure MS LINQ, so I wrote my own extension method to do it:
public static bool In<T>(this T objToCheck, params T[] values)
{
if (values == null || values.Length == 0)
{
return false; //early out
}
else
{
foreach (T t in values)
{
if (t.Equals(objToCheck))
return true; //RETURN found!
}
return false; //nothing found
}
}
I would do something like this but i bet there is a simpler way. i think the sql from linqtosql would use a select from person Where NOT EXIST(select from your exclusion list)
static class Program
{
public class Person
{
public string Key { get; set; }
public Person(string key)
{
Key = key;
}
}
public class NotPerson
{
public string Key { get; set; }
public NotPerson(string key)
{
Key = key;
}
}
static void Main()
{
List<Person> persons = new List<Person>()
{
new Person ("1"),
new Person ("2"),
new Person ("3"),
new Person ("4")
};
List<NotPerson> notpersons = new List<NotPerson>()
{
new NotPerson ("3"),
new NotPerson ("4")
};
var filteredResults = from n in persons
where !notpersons.Any(y => n.Key == y.Key)
select n;
foreach (var item in filteredResults)
{
Console.WriteLine(item.Key);
}
}
}
This LINQ below will generate the SQL for a left outer join and then take all of the results that don't find a match in your exclusion list.
List<Person> filteredResults =from p in people
join e in exclusions on p.compositeKey equals e.compositeKey into temp
from t in temp.DefaultIfEmpty()
where t.compositeKey == null
select p
let me know if it works!
var result = Data.Where(x =>
{
bool condition = true;
double accord = (double)x[Table.Columns.IndexOf(FiltercomboBox.Text)];
return condition && accord >= double.Parse(FilterLowertextBox.Text) && accord <= double.Parse(FilterUppertextBox.Text);
});