I'd like to know if there is a better way (still using LINQ) of achieving the following, which checks that both this have the same numbers in them:
var list1 = new int[] { 1, 2, 3, 4 };
var list2 = new int[] { 2, 1, 3, 4 };
return list1.Intersect(list2).Count() == list2.Count();
The above example would return true
I would use two HashSet<int> and the SetEquals method:
var l1Lookup = new HashSet<int>(list1);
var l2Lookup = new HashSet<int>(list2);
bool containsSame = l1Lookup.SetEquals(l2Lookup); // true
The SetEquals method ignores duplicate entries and the order of
elements in the other parameter. If the collection represented by
other is a HashSet collection with the same equality comparer as
the current HashSet object, this method is an O(n) operation.
Otherwise, this method is an O(n + m) operation, where n is the number
of elements in other and m is Count.
Your Count() approach can be inefficient if the sequences are large or/and they are not a collection but an expensive query. It can also be incorrect since the count of all items is not necessarily the count of intersecting items since Intersect removes duplicates.
I think this works but not sure it's efficient enough:
bool isEqual = list1.OrderBy(x=>x).SequenceEqual(list2.OrderBy(x=>x));
Related
Similar to existing "ContainsAll" methods but specifically I want to check that any duplicates from the sublist are also present in the main list.
eg. I have some lists:
List<int> a = new List<int> { 1, 2, 1, 3 };
List<int> b = new List<int> { 1, 1, 2 };
List<int> c = new List<int> { 2, 2, 3 };
What I want is a function bool ContainsAll(List<T> l1, List<T> l2) such that ContainsAll(a, b) == true (since the duplicate 1's are common to both lists) but ContainsAll(a, c) == false (since list a doesn't have multiple 2's).
I could of course search manually through the main list, removing the items as I find them. However this would require duplicating the list (since I don't want to modify it) and I was hoping for a cleaner/faster approach if one exists.
ETA: I need to check that the quantity of each element found in the larger list is at least as many as in the smaller list. Not merely that there are multiple in both lists, but that each element from the smaller list can be paired with a unique element in the larger list.
I don't have a specific performance requirement. I really just want to know if there is a more "correct" way than the manual check. You could argue that "correct" may mean faster, or more readable, or simply easier to write by using inbuilt functions. Maybe there isn't a better way. I will add that my use case may involve checking the larger list against several smaller lists, so a one-time transformation of the larger list (eg. to a dictionary) is certainly a consideration.
You could count each element of both lists, and then check that the counts of all elements in l2 is less than or equal to the corresponding count in l1.
using System.Collections.Generic;
static Dictionary<T, int> Count<T>(List<T> l)
{
Dictionary<T, int> c = new Dictionary<T, int>();
foreach (var o in l)
{
if (!c.ContainsKey(o))
c[o] = 1;
else
c[o]++;
}
return c;
}
static bool ContainsAll<T>(List<T> l1, List<T> l2)
{
Dictionary<T, int> c1 = Count(l1);
Dictionary<T, int> c2 = Count(l2);
foreach (var kvp2 in c2)
{
// If c1 doesn't contain the current value
// or its count is < the current value's count in l2
// return false
if (!c1.ContainsKey(kvp2.Key) || c1[kvp2.Key] < kvp2.Value)
return false;
}
// All checks were successful, return true
return true;
}
Try it online
Of course, this approach involves building a dictionary, so you sacrifice memory for speed, but it'll be faster than checking using List.Contains() because looking up in a list is O(n)
If you make your a a lookup, you can then ask if it contains all b's keys at a greater than or equal count
var al = a.ToLookup(x=>x);
return b.ToLookup(x=>x).All(ble => al.Contains(ble.Key) && al[ble.Key].Count() >= ble.Count());
You can optimize this by building only one set of counts, then checking it can "pay" for the second set of values
var d = new Dictionary<int, int>();
a.ForEach(x => { if(d.ContainsKey(x)) d[x]++; else d[x] = 1;});
b.ForEach(x => { if(--d[x] < 0) throw new Exception($"More {x} in B than A"); });
In this latter line of code one of two things could happen:
the dictionary built from a does not contain the key from b - a keynotfoundexception arises
the count dips below zero (the --d[x] returns the new value stored in the dictionary. If it is negative there are more of key x in b than a and a different exception arises
Thus if the exception arises b is not a subset of a. If it doesn't, it is
You can change it to be exceptionless if you want by expanding the second ForEach to a foreach loop and doing if(!d.ContainsKey(x) || --d[x] < 0) return false; in the loop and return true outside the loop..
var list = new List<int>(){1,17,18,21,30};
Random rnd = new Random(DateTime.Now.Second);
int r;
do
{
r = rnd.Next(1, 30);
}
while (list.Contains(r));
but i think that's a stupid solution, can anyone give me a more optimized approach?
even better if there is a way to prevent the Random instance from returning a number that it has already returned.
in case anyone wonders why do i need this its the first step in shuffling 3 byte arrays and combining them into one byte array and producing 3 byte arrays that hold the indices original order as it was in the original arrays.
Yes, one thing to make it much more efficient is use a HashSet<int> instead of a List<int> lookups for a HashSet are MUCH faster than a List (however the cost of the constructor will be slightly more for a HashSet).
Also if the input list is always the same numbers move it out of the function to help reduce the cost overhead of generating the HashSet the first time.
Due to order now mattering, in my personal experience (please test and profile for your own situation), after about 14 items in the list it is faster to convert a list to a HashSet and do the lookup than doing the lookup in the list itself.
var list = new List<int>(){1,17,18,21,30};
Random rnd = new Random(DateTime.Now.Second);
int r;
//In this example with 5 items in the list the HashSet will be slower do to the cost
// of creating it, but if we knew that the 5 items where fixed I would make this
// outside of the function so I would only have to pay the cost once per program
// start-up and it would be considered faster again due to amortized start-up cost.
var checkHashSet = new HashSet<int>(list);
do
{
r = rnd.Next(1, 30);
}
while (checkHashSet.Contains(rnd.Next(1, 30))); //Shouldent this be "r" not "rnd.Next(1,30)"?
You're right that looping isn't particularly efficient. You can use some handy extensions to select a number if you consider the constraint of the list of valid numbers, as opposed to the list of invalid ones.
So you have your list of invalid numbers:
var list = new List<int>(){1,17,18,21,30};
Which means that your list of valid numbers is the range from 1-30 except for these. Something like:
var validList = Enumerable.Range(1, 30).Except(list);
So we can use these extensions from the linked answer:
public static T RandomElement(this IEnumerable<T> enumerable)
{
return enumerable.RandomElementUsing(new Random());
}
public static T RandomElementUsing(this IEnumerable<T> enumerable, Random rand)
{
int index = rand.Next(0, enumerable.Count());
return enumerable.ElementAt(index);
}
And select a random element from the list of known valid numbers:
var kindOfRandomNumber = Enumerable.Range(1, 30).Except(list).RandomElement();
IMPORTANT NOTE
To the people who flagged this as a duplicate, please understand we do NOT want a LINQ-based solution. Our real-world example has several original lists in the tens-of-thousands range and LINQ-based solutions are not performant enough for our needs since they have to walk the lists several times to perform their function, expanding with each new source list.
That is why we are specifically looking for a non-LINQ algorithm, such as the one suggested in this answer below where they walk all lists simultaneously, and only once, via enumerators. That seems to be the best so far, but I am wondering if there are others.
Now back to the question...
For the sake of explaining our issue, consider this hypothetical problem:
I have multiple lists, but to keep this example simple, let's limit it to two, ListA and ListB, both of which are of type List<int>. Their data is as follows:
List A List B
1 2
2 3
4 4
5 6
6 8
8 9
9 10
...however the real lists can have tens of thousands of rows.
We next have a class called ListPairing that's simply defined as follows:
public class ListPairing
{
public int? ASide{ get; set; }
public int? BSide{ get; set; }
}
where each 'side' parameter really represents one of the lists. (i.e. if there were four lists, it would also have a CSide and a DSide.)
We are trying to do is construct a List<ListPairing> with the data initialized as follows:
A Side B Side
1 -
2 2
- 3
4 4
5 -
6 6
8 8
9 9
- 10
Again, note there is no row with '7'
As you can see, the results look like a full outer join. However, please see the update below.
Now to get things started, we can simply do this...
var finalList = ListA.Select(valA => new ListPairing(){ ASide = valA} );
Which yields...
A Side B Side
1 -
2 -
4 -
5 -
6 -
8 -
9 -
and now we want to go back-fill the values from List B. This requires checking first if there is an already existing ListPairing with ASide that matches BSide and if so, setting the BSide.
If there is no existing ListPairing with a matching ASide, a new ListPairing is instantiated with only the BSide set (ASide is blank.)
However, I get the feeling that's not the most efficient way to do this considering all of the required 'FindFirst' calls it would take. (These lists can be tens of thousands of items long.)
However, taking a union of those lists once up front yields the following values...
1, 2, 3, 4, 5, 6, 8, 9, 10 (Note there is no #7)
My thinking was to somehow use that ordered union of the values, then 'walking' both lists simultaneously, building up ListPairings as needed. That eliminates repeated calls to FindFirst, but I'm wondering if that's the most efficient way to do this.
Thoughts?
Update
People have suggested this is a duplicate of getting a full outer join using LINQ because the results are the same...
I am not after a LINQ full outer join. I'm after a performant algorithm.
As such, I have updated the question.
The reason I bring this up is the LINQ needed to perform that functionality is much too slow for our needs. In our model, there are actually four lists, and each can be in the tens of thousands of rows. That's why I suggested the 'Union' approach of the IDs at the very end to get the list of unique 'keys' to walk through, but I think the posted answer on doing the same but with the enumerators is an even better approach as you don't need the list of IDs up front. This would yield a single pass through all items in the lists simultaneously which would easily out-perform the LINQ-based approach.
This didn't turn out as neat as I'd hoped, but if both input lists are sorted then you can just walk through them together comparing the head elements of each one: if they're equal then you have a pair, else emit the smallest one on its own and advance that list.
public static IEnumerable<ListPairing> PairUpLists(IEnumerable<int> sortedAList,
IEnumerable<int> sortedBList)
{
// Should wrap these two in using() per Servy's comment with braces around
// the rest of the method.
var aEnum = sortedAList.GetEnumerator();
var bEnum = sortedBList.GetEnumerator();
bool haveA = aEnum.MoveNext();
bool haveB = bEnum.MoveNext();
while (haveA && haveB)
{
// We still have values left on both lists.
int comparison = aEnum.Current.CompareTo(bEnum.Current);
if (comparison < 0)
{
// The heads of the two remaining sequences do not match and A's is
// lower. Generate a partial pair with the head of A and advance the
// enumerator.
yield return new ListPairing() {ASide = aEnum.Current};
haveA = aEnum.MoveNext();
}
else if (comparison == 0)
{
// The heads of the two sequences match. Generate a pair.
yield return new ListPairing() {
ASide = aEnum.Current,
BSide = bEnum.Current
};
// Advance both enumerators
haveA = aEnum.MoveNext();
haveB = bEnum.MoveNext();
}
else
{
// No match and B is the lowest. Generate a partial pair with B.
yield return new ListPairing() {BSide = bEnum.Current};
// and advance the enumerator
haveB = bEnum.MoveNext();
}
}
if (haveA)
{
// We still have elements on list A but list B is exhausted.
do
{
// Generate a partial pair for all remaining A elements.
yield return new ListPairing() { ASide = aEnum.Current };
} while (aEnum.MoveNext());
}
else if (haveB)
{
// List A is exhausted but we still have elements on list B.
do
{
// Generate a partial pair for all remaining B elements.
yield return new ListPairing() { BSide = bEnum.Current };
} while (bEnum.MoveNext());
}
}
var list1 = new List<int?>(){1,2,4,5,6,8,9};
var list2 = new List<int?>(){2,3,4,6,8,9,10};
var left = from i in list1
join k in list2 on i equals k
into temp
from k in temp.DefaultIfEmpty()
select new {a = i, b = (i == k) ? k : (int?)null};
var right = from k in list2
join i in list1 on k equals i
into temp
from i in temp.DefaultIfEmpty()
select new {a = (i == k) ? i : (int?)i , b = k};
var result = left.Union(right);
If you need the ordering to be same as your example, then you will need to provide an index and order by that (then remove duplicates)
var result = left.Select((o,i) => new {o.a, o.b, i}).Union(right.Select((o, i) => new {o.a, o.b, i})).OrderBy( o => o.i);
result.Select( o => new {o.a, o.b}).Distinct();
Is there a way, with LINQ, to check if a list of integers are "sequential" - ie 1,2,3,4,5 or 14,15,16,17,18?
You could do this via Enumerable.Zip:
bool sequential = values.Zip(values.Skip(1), (a,b) => (a+1) == b).All(x => x);
This works by taking each pair of values, and checking to see if the second is 1 more than the first, and returning booleans. If all pairs fit the criteria, the values are sequential.
Given that this is a list of integers, you can do this slightly more efficiently using:
bool sequential = values.Skip(1).Select((v,i) => v == (values[i]+1)).All(v => v);
This will only work on sequences which can be accessed by index. Note that we use values[i], not values[i-1], as the Skip call effectively shifts the indices.
bool isSequential = Enumerable.Range(values.Min(), values.Count())
.SequenceEqual(values);
One more option is to use Aggregate to iterate sequence only once.
Note that unlike All suggested by Reed Copsey Aggregate can't stop in the middle when condition fails...
var s = new int[] {3,4,5,6}.ToList();
var isSequential = s.Aggregate
(
new {PrevValue = 0, isFirst = true, Success = true} ,
(acc, current) =>
new {
PrevValue = current,
isFirst = false,
Success = acc.Success && (acc.isFirst || (acc.PrevValue == current - 1))
}
)
.Success;
Fancier version would be to have iterator that carries previous value along or special code that would split iterator on "First and the rest" allowing to implement Reed's solution with single iteration for any enumerable.
If you already know that the numbers you have in your list is unique, and also sorted, then the simplest check for sequential is just
lst[lst.Count - 1] - lst[0] == lst.Count - 1
Assume atleast 1 element in list.
I'm trying to implement a paging algorithm for a dataset sortable via many criteria. Unfortunately, while some of those criteria can be implemented at the database level, some must be done at the app level (we have to integrate with another data source). We have a paging (actually infinite scroll) requirement and are looking for a way to minimize the pain of sorting the entire dataset at the app level with every paging call.
What is the best way to do a partial sort, only sorting the part of the list that absolutely needs to be sorted? Is there an equivalent to C++'s std::partial_sort function available in the .NET libraries? How should I go about solving this problem?
EDIT: Here's an example of what I'm going for:
Let's say I need to get elements 21-40 of a 1000 element set, according to some sorting criteria. In order to speed up the sort, and since I have to go through the whole dataset every time anyway (this is a web service over HTTP, which is stateless), I don't need the whole dataset ordered. I only need elements 21-40 to be correctly ordered. It is sufficient to create 3 partitions: Elements 1-20, unsorted (but all less than element 21); elements 21-40, sorted; and elements 41-1000, unsorted (but all greater than element 40).
OK. Here's what I would try based on what you said in reply to my comment.
I want to be able to say "4th through 6th" and get something like: 3,
2, 1 (unsorted, but all less than proper 4th element); 4, 5, 6 (sorted
and in the same place they would be for a sorted list); 8, 7, 9
(unsorted, but all greater than proper 6th element).
Lets add 10 to our list to make it easier: 10, 9, 8, 7, 6, 5, 4, 3, 2, 1.
So, what you could do is use the quick select algorithm to find the the ith and kth elements. In your case above i is 4 and k is 6. That will of course return the values 4 and 6. That's going to take two passes through your list. So, so far the runtime is O(2n) = O(n). The next part is easy, of course. We have lower and upper bounds on the data we care about. All we need to do is make another pass through our list looking for any element that is between our upper and lower bounds. If we find such an element we throw it into a new List. Finally, we then sort our List which contains only the ith through kth elements that we care about.
So, I believe the total runtime ends up being O(N) + O((k-i)lg(k-i))
static void Main(string[] args) {
//create an array of 10 million items that are randomly ordered
var list = Enumerable.Range(1, 10000000).OrderBy(x => Guid.NewGuid()).ToList();
var sw = Stopwatch.StartNew();
var slowOrder = list.OrderBy(x => x).Skip(10).Take(10).ToList();
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
//Took ~8 seconds on my machine
sw.Restart();
var smallVal = Quickselect(list, 11);
var largeVal = Quickselect(list, 20);
var elements = list.Where(el => el >= smallVal && el <= largeVal).OrderBy(el => el);
Console.WriteLine(sw.ElapsedMilliseconds);
//Took ~1 second on my machine
}
public static T Quickselect<T>(IList<T> list , int k) where T : IComparable {
Random rand = new Random();
int r = rand.Next(0, list.Count);
T pivot = list[r];
List<T> smaller = new List<T>();
List<T> larger = new List<T>();
foreach (T element in list) {
var comparison = element.CompareTo(pivot);
if (comparison == -1) {
smaller.Add(element);
}
else if (comparison == 1) {
larger.Add(element);
}
}
if (k <= smaller.Count) {
return Quickselect(smaller, k);
}
else if (k > list.Count - larger.Count) {
return Quickselect(larger, k - (list.Count - larger.Count));
}
else {
return pivot;
}
}
You can use List<T>.Sort(int, int, IComparer<T>):
inputList.Sort(startIndex, count, Comparer<T>.Default);
Array.Sort() has an overload that accepts index and length arguments that lets you sort a subset of an array. The same exists for List.
You cannot sort an IEnumerable directly, of course.