C# efficient ContainsAll with duplicates?

C# efficient ContainsAll with duplicates? - c#

Similar to existing "ContainsAll" methods but specifically I want to check that any duplicates from the sublist are also present in the main list.
eg. I have some lists:
List<int> a = new List<int> { 1, 2, 1, 3 };
List<int> b = new List<int> { 1, 1, 2 };
List<int> c = new List<int> { 2, 2, 3 };
What I want is a function bool ContainsAll(List<T> l1, List<T> l2) such that ContainsAll(a, b) == true (since the duplicate 1's are common to both lists) but ContainsAll(a, c) == false (since list a doesn't have multiple 2's).
I could of course search manually through the main list, removing the items as I find them. However this would require duplicating the list (since I don't want to modify it) and I was hoping for a cleaner/faster approach if one exists.
ETA: I need to check that the quantity of each element found in the larger list is at least as many as in the smaller list. Not merely that there are multiple in both lists, but that each element from the smaller list can be paired with a unique element in the larger list.
I don't have a specific performance requirement. I really just want to know if there is a more "correct" way than the manual check. You could argue that "correct" may mean faster, or more readable, or simply easier to write by using inbuilt functions. Maybe there isn't a better way. I will add that my use case may involve checking the larger list against several smaller lists, so a one-time transformation of the larger list (eg. to a dictionary) is certainly a consideration.

You could count each element of both lists, and then check that the counts of all elements in l2 is less than or equal to the corresponding count in l1.
using System.Collections.Generic;
static Dictionary<T, int> Count<T>(List<T> l)
{
Dictionary<T, int> c = new Dictionary<T, int>();
foreach (var o in l)
{
if (!c.ContainsKey(o))
c[o] = 1;
else
c[o]++;
}
return c;
}
static bool ContainsAll<T>(List<T> l1, List<T> l2)
{
Dictionary<T, int> c1 = Count(l1);
Dictionary<T, int> c2 = Count(l2);
foreach (var kvp2 in c2)
{
// If c1 doesn't contain the current value
// or its count is < the current value's count in l2
// return false
if (!c1.ContainsKey(kvp2.Key) || c1[kvp2.Key] < kvp2.Value)
return false;
}
// All checks were successful, return true
return true;
}
Try it online
Of course, this approach involves building a dictionary, so you sacrifice memory for speed, but it'll be faster than checking using List.Contains() because looking up in a list is O(n)

If you make your a a lookup, you can then ask if it contains all b's keys at a greater than or equal count
var al = a.ToLookup(x=>x);
return b.ToLookup(x=>x).All(ble => al.Contains(ble.Key) && al[ble.Key].Count() >= ble.Count());
You can optimize this by building only one set of counts, then checking it can "pay" for the second set of values
var d = new Dictionary<int, int>();
a.ForEach(x => { if(d.ContainsKey(x)) d[x]++; else d[x] = 1;});
b.ForEach(x => { if(--d[x] < 0) throw new Exception($"More {x} in B than A"); });
In this latter line of code one of two things could happen:
the dictionary built from a does not contain the key from b - a keynotfoundexception arises
the count dips below zero (the --d[x] returns the new value stored in the dictionary. If it is negative there are more of key x in b than a and a different exception arises
Thus if the exception arises b is not a subset of a. If it doesn't, it is
You can change it to be exceptionless if you want by expanding the second ForEach to a foreach loop and doing if(!d.ContainsKey(x) || --d[x] < 0) return false; in the loop and return true outside the loop..

Related

Avoiding 'System.OutOfMemoryException' while iterating over a IEnumerable<IEnumerable<object>>

I have the following code to get the cheapest List of objects which satisfy the requiredNumbers criteria. This list of objects can have a length varying from 1 to maxLength, i.e. there can be a combination of 1 to maxLength of objects with repitition allowed. Right now, this this iterates over the whole list of combinations (IEnumerable of IEnumerable of OBJECT) fine till maxLength = 9 and breaks after that with a "System.OutOfMemoryException" at
t1.Concat(new OBJECT[] { t2 }
I tried another approach to solve this (mentioned in the code comments), but that seems to have its own demons. What I understand right now is , I'll have to somehow know the least priced combination of objects without iterating over the whole List of combination, which I can't seem to find feasible.
Could someone suggest any changes that let the maxLength be higher(much higher ideally), without hindering the performance. Any help is much appreciated. Please let me know if I am not clear.
private static int leastPrice = int.MaxValue;
private IEnumerable<IEnumerable<OBJECT>> CombinationOfObjects(IEnumerable<OBJECT> objects, int length)
{
if (length == 1)
return objects.Select(t => new OBJECT[] { t });
return CombinationOfObjects(objects, length - 1).SelectMany(t => objects, (t1, t2) => t1.Concat(new OBJECT[] { t2 }));
}
//Gets the least priced Valid combination out of all possible
public IEnumerable<OBJECT> GetValidCombination(IEnumerable<OBJECT> list, int maxLength, int[] matArray)
{
IEnumerable<IEnumerable<OBJECT>> tempList = null;
List<IEnumerable<OBJECT>> validList = new List<IEnumerable<OBJECT>>();
for (int i = 1; i <= maxLength; i++)
{
tempList = CombinationOfObjects(list, i);
tempList = from alist in tempList
orderby alist.Sum(x => x.Price)
select alist;
foreach (var lst in tempList)
{
//This check will not be required if the least priced value is returned as soon as found
int price = lst.Sum(c => c.Price);
if (price < leastPrice)
{
if (CheckMaterialSum(lst, matArray))
{
validList.Add(lst);
leastPrice = price;
break;
//return lst;
//returning lst as soon as valid combo is found is fastest
//Con being it also returns the least priced least item containing combo
//i.e. even if a 4 item combo is cheaper than the 2 item combo satisfying the need,
//it'll never even check for the 4 item combo
}
}
}
}
//This whole thing would go too if lst was returned earlier
foreach (IEnumerable<OBJECT> combination in validList)
{
int priceTotal = combination.Sum(combo => combo.Price);
if (priceTotal == leastPrice)
{
return combination;
}
}
return new List<OBJECT>();
}
//Checks if the given combination satisfies the requirement
private bool CheckMaterialSum(IEnumerable<OBJECT> combination, int[] matArray)
{
int[] sumMatProp = new int[matArray.Count()];
for (int i = 0; i < matArray.Count(); i++)
{
sumMatProp[i] = combination.Sum(combo => combo.Numbers[i]);
}
bool isCombinationValid = matArray.Zip(sumMatProp, (requirement, c) => c >= requirement).All(comboValid => comboValid);
return isCombinationValid;
}
static void Main(string[] args)
{
List<OBJECT> testList = new List<OBJECT>();
OBJECT object1 = new OBJECT();
object1.Name = "object1";
object1.Price = 2000;
object1.Numbers = new int[] { 2, 3, 4 };
testList.Add(object1);
OBJECT object2 = new OBJECT();
object2.Name = "object2";
object2.Price = 1900;
object2.Numbers = new int[] { 3, 2, 4 };
testList.Add(object1);
OBJECT object3 = new OBJECT();
object3.Name = "object3";
object3.Price = 1600;
object3.Numbers = new int[] { 4, 3, 2 };
testList.Add(object1);
int requiredNumbers = new int[]{10,10,10};
int maxLength = 9;//This is the max length possible, OutOf Mememory exception after this
IEnumerable<OBJECT> resultCombination = GetValidCombination(testList, maxLength, requiredNumbers);
}
EDIT
Requirement:
I have a number of objects having several properties, namely, Price, Name , and Materials. Now, I need to find such a combination of these objects that the sum of all materials in a combination satisfies the user input qty of materials. Also, the combination needs to be of least price possible.
There is a constraint of maxLength and it sets the maximum total number of objects that can be in a combination, i.e. for a maxLength = 8, the combination may contain anywhere from 1 to 8 objects.
Approaches tried:
1.
-I find all combinations of objects possible (valid + invalid)
-Iterate over them to find the least priced combination. This goes out of memory while iterating.
2.
-I find all combinations possible (valid + invalid)
-Apply a validity check (i.e if it fulfills the user requirement)
-Add only valid combinations in a List of List
-Iterate over this valid List of lists to find the cheapest list and return that. Also goes out of memory
3.
-I find combinations in increasing order of objects (i.e. first all combinations having 1 object, then 2 then so on...)
-Sort the combinations according to price
-Apply validity check and return the first valid combination
-Now this works fine performance wise, but does not always return the cheapest possible combination.
If I could somehow get the optimal solution without iterating over the whole list , that would solve it. But, all of the things that I've tried either have to iterate over all combinations or simply do not result in the optimal solution.
Any help regarding even some other approach that I can't seem to think of is most welcome.

Transform sequence in Linq while being aware of each Select/SelectMany result

Here's my problem. I have one specific list, which I'll present as a int[] for simplicity's sake.
int[] a = {1,2,3,4,5};
Suppose I need to transform each item on this list, but depending on the situation, I may return an int or an array of ints.
As an example, suppose I need to return {v} if the value is odd, and {v,v+1} if the value is even. I've done this:
int[] b = a.SelectMany(v => v % 2 == 0 ? new int[] { v, v+1 } : new int[] { v })
.ToArray();
So if I run this, I'll get the expected response:
{1,2,3,3,4,5,5}
See that I have repeating numbers, right? 3 and 5. I don't want those repeating numbers. Now, you may tell me that I can just call .Distinct() after processing the array.
This is the problem. The SelectMany clause is fairly complex (I just made up a simpler example), and I definitely don't want to process 3 if it's already present in the list.
I could check if 3 is present in the original list. But if I got 3 in the SelectMany clause, I don't want to get it again. For instance, if I had this list:
int[] a = {1,2,3,4,5,2};
I would get this:
{1,2,3,3,4,5,5,2,3}
Thus returning v (my original value) and v+1 again at the end. Just so you can understand it better v+1 represents some processing I want to avoid.
Summarizing, this is what I want:
I have a list of objects. (Check)
I need to filter them, and depending on the result, I may need to return more than one object. (Check, used SelectMany)
I need them to be distinct, but I can't do that at the end of the process. I should be able to return just {v} if {v+1} already exists. (Clueless...)
One thing I thought about is writing a custom SelectMany which may suit my needs, but I want to be sure there's no built-in way to do this.
EDIT: I believe I may have mislead you guys with my example. I know how to figure out if v+1 is in a list. To be clear, I have one object which has 2 int properties, Id and IdParent. I need to "yield return" all the objects and their parents. But I just have the ParentId, which comes from the objects themselves. I'm able to know if v+1 is in the list because I can check if any object there has the same Id as the ParentId I'm checking.
ANSWER: I ended up using Aggregate, which can be used to do exactly what I'm looking for.

Does this simple loop with the HashSet<int> help?
int[] a = {1,2,3,4,5,2};
var aLookupList = new HashSet<int>();
foreach (int i in a)
{
bool isEven = i % 2 == 0;
if (isEven)
{
aLookupList.Add(i);
aLookupList.Add(i + 1);
}
else
{
aLookupList.Add(i);
}
}
var result = aLookupList.ToArray();

What about this using Aggregate method. You won't be processing numbers that are already in the list, wheather they were in the original list or as a result of applying (v + 1)
int[] v = { 1, 2, 3, 4, 5, 2 };
var result = v.Aggregate(new List<int>(),
(acc, next) =>
{
if (!acc.Contains(next))
return (next % 2 == 0) ? acc.Concat(new int[] { next, next + 1 }).ToList()
: acc.Concat(new int[] { next }).ToList();
else
return acc;
}).ToArray();

var existing = new HashSet<int>(a);
var result = existing
.Where(v => v % 2 == 0 && !existing.Contains(v + 1))
.Select(v => v + 1)
.Concat(existing)
.ToArray();

As I understand you have this input:
int[] a = {1,2,3,4,5};
And the output should also be {1,2,3,4,5} because you don't want duplicated numbers as you describe.
Because you use an array as input, you can try this code:
var output = a.SelectMany((x,i)=> x % 2 == 0 ? new []{x,x+1} :
i > 0 && a[i-1]==x-1 ? new int[]{} : new []{x});
//if the input is {1,2,4,5}
//The output is also {1,2,3,4,5}

What is the fastest non-LINQ algorithm to 'pair up' matching items from multiple separate lists?

IMPORTANT NOTE
To the people who flagged this as a duplicate, please understand we do NOT want a LINQ-based solution. Our real-world example has several original lists in the tens-of-thousands range and LINQ-based solutions are not performant enough for our needs since they have to walk the lists several times to perform their function, expanding with each new source list.
That is why we are specifically looking for a non-LINQ algorithm, such as the one suggested in this answer below where they walk all lists simultaneously, and only once, via enumerators. That seems to be the best so far, but I am wondering if there are others.
Now back to the question...
For the sake of explaining our issue, consider this hypothetical problem:
I have multiple lists, but to keep this example simple, let's limit it to two, ListA and ListB, both of which are of type List<int>. Their data is as follows:
List A List B
1 2
2 3
4 4
5 6
6 8
8 9
9 10
...however the real lists can have tens of thousands of rows.
We next have a class called ListPairing that's simply defined as follows:
public class ListPairing
{
public int? ASide{ get; set; }
public int? BSide{ get; set; }
}
where each 'side' parameter really represents one of the lists. (i.e. if there were four lists, it would also have a CSide and a DSide.)
We are trying to do is construct a List<ListPairing> with the data initialized as follows:
A Side B Side
1 -
2 2
- 3
4 4
5 -
6 6
8 8
9 9
- 10
Again, note there is no row with '7'
As you can see, the results look like a full outer join. However, please see the update below.
Now to get things started, we can simply do this...
var finalList = ListA.Select(valA => new ListPairing(){ ASide = valA} );
Which yields...
A Side B Side
1 -
2 -
4 -
5 -
6 -
8 -
9 -
and now we want to go back-fill the values from List B. This requires checking first if there is an already existing ListPairing with ASide that matches BSide and if so, setting the BSide.
If there is no existing ListPairing with a matching ASide, a new ListPairing is instantiated with only the BSide set (ASide is blank.)
However, I get the feeling that's not the most efficient way to do this considering all of the required 'FindFirst' calls it would take. (These lists can be tens of thousands of items long.)
However, taking a union of those lists once up front yields the following values...
1, 2, 3, 4, 5, 6, 8, 9, 10 (Note there is no #7)
My thinking was to somehow use that ordered union of the values, then 'walking' both lists simultaneously, building up ListPairings as needed. That eliminates repeated calls to FindFirst, but I'm wondering if that's the most efficient way to do this.
Thoughts?
Update
People have suggested this is a duplicate of getting a full outer join using LINQ because the results are the same...
I am not after a LINQ full outer join. I'm after a performant algorithm.
As such, I have updated the question.
The reason I bring this up is the LINQ needed to perform that functionality is much too slow for our needs. In our model, there are actually four lists, and each can be in the tens of thousands of rows. That's why I suggested the 'Union' approach of the IDs at the very end to get the list of unique 'keys' to walk through, but I think the posted answer on doing the same but with the enumerators is an even better approach as you don't need the list of IDs up front. This would yield a single pass through all items in the lists simultaneously which would easily out-perform the LINQ-based approach.

This didn't turn out as neat as I'd hoped, but if both input lists are sorted then you can just walk through them together comparing the head elements of each one: if they're equal then you have a pair, else emit the smallest one on its own and advance that list.
public static IEnumerable<ListPairing> PairUpLists(IEnumerable<int> sortedAList,
IEnumerable<int> sortedBList)
{
// Should wrap these two in using() per Servy's comment with braces around
// the rest of the method.
var aEnum = sortedAList.GetEnumerator();
var bEnum = sortedBList.GetEnumerator();
bool haveA = aEnum.MoveNext();
bool haveB = bEnum.MoveNext();
while (haveA && haveB)
{
// We still have values left on both lists.
int comparison = aEnum.Current.CompareTo(bEnum.Current);
if (comparison < 0)
{
// The heads of the two remaining sequences do not match and A's is
// lower. Generate a partial pair with the head of A and advance the
// enumerator.
yield return new ListPairing() {ASide = aEnum.Current};
haveA = aEnum.MoveNext();
}
else if (comparison == 0)
{
// The heads of the two sequences match. Generate a pair.
yield return new ListPairing() {
ASide = aEnum.Current,
BSide = bEnum.Current
};
// Advance both enumerators
haveA = aEnum.MoveNext();
haveB = bEnum.MoveNext();
}
else
{
// No match and B is the lowest. Generate a partial pair with B.
yield return new ListPairing() {BSide = bEnum.Current};
// and advance the enumerator
haveB = bEnum.MoveNext();
}
}
if (haveA)
{
// We still have elements on list A but list B is exhausted.
do
{
// Generate a partial pair for all remaining A elements.
yield return new ListPairing() { ASide = aEnum.Current };
} while (aEnum.MoveNext());
}
else if (haveB)
{
// List A is exhausted but we still have elements on list B.
do
{
// Generate a partial pair for all remaining B elements.
yield return new ListPairing() { BSide = bEnum.Current };
} while (bEnum.MoveNext());
}
}

var list1 = new List<int?>(){1,2,4,5,6,8,9};
var list2 = new List<int?>(){2,3,4,6,8,9,10};
var left = from i in list1
join k in list2 on i equals k
into temp
from k in temp.DefaultIfEmpty()
select new {a = i, b = (i == k) ? k : (int?)null};
var right = from k in list2
join i in list1 on k equals i
into temp
from i in temp.DefaultIfEmpty()
select new {a = (i == k) ? i : (int?)i , b = k};
var result = left.Union(right);
If you need the ordering to be same as your example, then you will need to provide an index and order by that (then remove duplicates)
var result = left.Select((o,i) => new {o.a, o.b, i}).Union(right.Select((o, i) => new {o.a, o.b, i})).OrderBy( o => o.i);
result.Select( o => new {o.a, o.b}).Distinct();

Is there a C# equivalent to C++ std::partial_sort?

I'm trying to implement a paging algorithm for a dataset sortable via many criteria. Unfortunately, while some of those criteria can be implemented at the database level, some must be done at the app level (we have to integrate with another data source). We have a paging (actually infinite scroll) requirement and are looking for a way to minimize the pain of sorting the entire dataset at the app level with every paging call.
What is the best way to do a partial sort, only sorting the part of the list that absolutely needs to be sorted? Is there an equivalent to C++'s std::partial_sort function available in the .NET libraries? How should I go about solving this problem?
EDIT: Here's an example of what I'm going for:
Let's say I need to get elements 21-40 of a 1000 element set, according to some sorting criteria. In order to speed up the sort, and since I have to go through the whole dataset every time anyway (this is a web service over HTTP, which is stateless), I don't need the whole dataset ordered. I only need elements 21-40 to be correctly ordered. It is sufficient to create 3 partitions: Elements 1-20, unsorted (but all less than element 21); elements 21-40, sorted; and elements 41-1000, unsorted (but all greater than element 40).

OK. Here's what I would try based on what you said in reply to my comment.
I want to be able to say "4th through 6th" and get something like: 3,
2, 1 (unsorted, but all less than proper 4th element); 4, 5, 6 (sorted
and in the same place they would be for a sorted list); 8, 7, 9
(unsorted, but all greater than proper 6th element).
Lets add 10 to our list to make it easier: 10, 9, 8, 7, 6, 5, 4, 3, 2, 1.
So, what you could do is use the quick select algorithm to find the the ith and kth elements. In your case above i is 4 and k is 6. That will of course return the values 4 and 6. That's going to take two passes through your list. So, so far the runtime is O(2n) = O(n). The next part is easy, of course. We have lower and upper bounds on the data we care about. All we need to do is make another pass through our list looking for any element that is between our upper and lower bounds. If we find such an element we throw it into a new List. Finally, we then sort our List which contains only the ith through kth elements that we care about.
So, I believe the total runtime ends up being O(N) + O((k-i)lg(k-i))
static void Main(string[] args) {
//create an array of 10 million items that are randomly ordered
var list = Enumerable.Range(1, 10000000).OrderBy(x => Guid.NewGuid()).ToList();
var sw = Stopwatch.StartNew();
var slowOrder = list.OrderBy(x => x).Skip(10).Take(10).ToList();
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
//Took ~8 seconds on my machine
sw.Restart();
var smallVal = Quickselect(list, 11);
var largeVal = Quickselect(list, 20);
var elements = list.Where(el => el >= smallVal && el <= largeVal).OrderBy(el => el);
Console.WriteLine(sw.ElapsedMilliseconds);
//Took ~1 second on my machine
}
public static T Quickselect<T>(IList<T> list , int k) where T : IComparable {
Random rand = new Random();
int r = rand.Next(0, list.Count);
T pivot = list[r];
List<T> smaller = new List<T>();
List<T> larger = new List<T>();
foreach (T element in list) {
var comparison = element.CompareTo(pivot);
if (comparison == -1) {
smaller.Add(element);
}
else if (comparison == 1) {
larger.Add(element);
}
}
if (k <= smaller.Count) {
return Quickselect(smaller, k);
}
else if (k > list.Count - larger.Count) {
return Quickselect(larger, k - (list.Count - larger.Count));
}
else {
return pivot;
}
}

You can use List<T>.Sort(int, int, IComparer<T>):
inputList.Sort(startIndex, count, Comparer<T>.Default);

Array.Sort() has an overload that accepts index and length arguments that lets you sort a subset of an array. The same exists for List.
You cannot sort an IEnumerable directly, of course.

Select items from List of structs

I've got List of sctructs. In struct there is field x. I would like to select those of structs, which are rather close to each other by parameter x. In other words, I'd like to clusterise them by x.
I guess, there should be one-line solution.
Thanks in advance.

If I understood correctly what you want, then you might need to sort your list by the structure's field X.

Look at the GroupBy extension method:
var items = mylist.GroupBy(c => c.X);
This article gives a lot of examples using group by.

If you're doing graph-style clustering, the easiest way to do it is by building up a list of clusters which is initially empty. Then loop over the input and, for each value, find all of the clusters which have at least one element which is close to the current value. All those clusters should then be merged together with the value. If there aren't any, then the value goes into a cluster all by itself.
Here is some sample code for how to do it with a simple list of integers.
IEnumerable<int> input;
int threshold;
List<List<int>> clusters = new List<List<int>>();
foreach(var current in input)
{
// Search the current list of clusters for ones which contain at least one
// entry such that the difference between it and x is less than the threshold
var matchingClusters =
clusters.Where(
cluster => cluster.Any(
val => Math.Abs(current - val) <= threshold)
).ToList();
// Merge all the clusters that were found, plus x, into a new cluster.
// Replace all the existing clusters with this new one.
IEnumerable<int> newCluster = new List<int>(new[] { current });
foreach (var match in matchingClusters)
{
clusters.Remove(match);
newCluster = newCluster.Concat(match);
}
clusters.Add(newCluster.ToList());
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# efficient ContainsAll with duplicates? - c#

Related

Avoiding 'System.OutOfMemoryException' while iterating over a IEnumerable<IEnumerable<object>>

Transform sequence in Linq while being aware of each Select/SelectMany result

What is the fastest non-LINQ algorithm to 'pair up' matching items from multiple separate lists?

Is there a C# equivalent to C++ std::partial_sort?

Select items from List of structs

Categories

Resources