Algorithm for dynamic combinations - c#

My code has a list called INPUTS, that contains a dynamic number of lists, let's call them A, B, C, .. N. These lists contain a dynamic number of Events
I would like to call a function with each combination of Events. To illustrate with an example:
INPUTS: A(0,1,2), B(0,1), C(0,1,2,3)
I need to call my function this many times for each combination (the input count is dynamic, in this example it is three parameter, but it can be more or less)
function(A[0],B[0],C[0])
function(A[0],B[1],C[0])
function(A[0],B[0],C[1])
function(A[0],B[1],C[1])
function(A[0],B[0],C[2])
function(A[0],B[1],C[2])
function(A[0],B[0],C[3])
function(A[0],B[1],C[3])
function(A[1],B[0],C[0])
function(A[1],B[1],C[0])
function(A[1],B[0],C[1])
function(A[1],B[1],C[1])
function(A[1],B[0],C[2])
function(A[1],B[1],C[2])
function(A[1],B[0],C[3])
function(A[1],B[1],C[3])
function(A[2],B[0],C[0])
function(A[2],B[1],C[0])
function(A[2],B[0],C[1])
function(A[2],B[1],C[1])
function(A[2],B[0],C[2])
function(A[2],B[1],C[2])
function(A[2],B[0],C[3])
function(A[2],B[1],C[3])
This is what I have thought of so far:
My approach so far is to build a list of combinations. The element combination is itself a list of "index" to the input arrays A, B and C. For our example:
my list iCOMBINATIONS contains the following iCOMBO lists
(0,0,0)
(0,1,0)
(0,0,1)
(0,1,1)
(0,0,2)
(0,1,2)
(0,0,3)
(0,1,3)
(1,0,0)
(1,1,0)
(1,0,1)
(1,1,1)
(1,0,2)
(1,1,2)
(1,0,3)
(1,1,3)
(2,0,0)
(2,1,0)
(2,0,1)
(2,1,1)
(2,0,2)
(2,1,2)
(2,0,3)
(2,1,3)
Then I would do this:
foreach( iCOMBO in iCOMBINATIONS)
{
foreach ( P in INPUTS )
{
COMBO.Clear()
foreach ( i in iCOMBO )
{
COMBO.Add( P[ iCOMBO[i] ] )
}
function( COMBO ) --- (instead of passing the events separately)
}
}
But I need to find a way to build the list iCOMBINATIONS for any given number of INPUTS and their events. Any ideas?
Is there actually a better algorithm than this?
any pseudo code to help me with will be great.
C# (or VB)
Thank You

You can use an array to hold the indexes for each list. Example:
List<List<int>> lists = new List<List<int>> {
new List<int> { 0,1,2 },
new List<int> { 0,1 },
new List<int> { 0,1,2,3 }
};
int[] cnt = new int[lists.Count];
int index;
do {
Console.WriteLine(String.Join(",", cnt.Select((c,i) => lists[i][c].ToString()).ToArray()));
index = cnt.Length - 1;
do {
cnt[index] = (cnt[index] + 1) % lists[index].Count;
} while(cnt[index--] == 0 && index != -1);
} while (index != -1 || cnt[0] != 0);

This is permutation problem. You may take a look at this:
http://www.interact-sw.co.uk/iangblog/2004/09/16/permuterate

I had similar problem some time ago (generating combinations), I've used code from: http://www.merriampark.com/comb.htm . It's java, but I hadn't any problems to translate it into C#.

Put A,B,C in matrix!
M=[A,B,C]
recursive_caller(d,params):
if d == len(M):
function(params)
return
for i in M[d]:
params[d]=i
recursive_caller(d+1,params)

It would seem that what you really want, is neither a permutation, nor a combination, per se. You want to look at the cartesian product (see here) of several sets, the iteration over which may involve iterating through combinations of individual sets.
However, this is unlike a combination problem, because you are looking for the ways to choose 1 element from each set. The number of ways to do this is the size of the set. Combinations problems usually involve choose k-many things from a set of n-many things, where k=1 or n is trivial.
Several methods of producing iterators in C# have been discussed here. (Including one by Jon Skeet).
If you are using .NET, you may also be interested in developed combinatorics modules, such as KwCombinatorics at CodePlex.
edit Now, with LINQ to the rescue:
private void cartesian1()
{
textAppend("Cartesian 1");
var setA = new[] { "whole wheat", "white", "rye" };
var setB = new[] { "cold cut", "veggie", "turkey", "roast beef" };
var setC = new[] { "everything", "just mayo" };
var query =
from bread in setA
from meat in setB
from toppings in setC
let sandwich = String.Format("{1} on {0} with {2}",
bread, meat, toppings)
select sandwich;
foreach( string sandwich in query )
{
textAppend(sandwich);
}
}

A modified version of #Guffa's answer. I am by no means a creator of this code.
List<int> lists = new List<int> { 3, 2, 4 };
int[] cnt = new int[lists.Count];
int index;
do
{
Console.WriteLine(String.Join(",", cnt));
index = cnt.Length - 1;
do
{
cnt[index] = (cnt[index] + 1) % lists[index];
} while (cnt[index--] == 0 && index != -1);
} while (index != -1 || cnt[0] != 0);
Instead of using List<List<int>> - with possible values - use List<int> describing the amount of elements in collection. The output is the same an in original answer. The performance is better.

Related

Sort two dimensional array with different lengths

Hi fellow programmers,
I am trying to sort a two dimensional array. This array represent a collection of objects which has a property which have a value in the list underneath. So the original list does not have to be saved.
The starting situation is:
var list = new List<string>
{
"10-158-6",
"11-158-6",
"90-158-6",
"20-15438-6",
"10-158-6",
"10-158-6-3434",
"10-1528-6"
};
The result should be
var list = new List<string>
{
"10-158-6",
"10-158-6",
"10-1528-6"
"10-158-6-3434",
"11-158-6",
"20-15438-6",
"90-158-6",
};
It should be first ordered on the first part -> then the second -> etc. etc.
I think it is almost impossible to sort these strings so I converted it to a two-dimensional list. I found different solutions to sort multi dimensional list but none can be used for this problem. Also I do not have a clue where to start...
Anyone has an idea how to write a sorting algorithm that doesn't have unnecessary huge big O?
Thanks in advance!
Jeroen
You can use Sort method; let's implement a general case with arbitrary long numbers:
Code:
var list = new List<string>() {
"10-158-6",
"11-158-6",
"90-158-6",
"20-15438-6",
"10-158-6",
"10-158-6-3434",
"10-1528-6",
"123456789012345678901234567890"
};
list.Sort((left, right) => {
var x = left.Split('-');
var y = right.Split('-');
// Compare numbers:
for (int i = 0; i < Math.Min(x.Length, y.Length); ++i) {
// Longer number is always bigger: "123" > "99"
int r = x[i].Length.CompareTo(y[i].Length);
// If numbers are of the same length, compare lexicographically: "459" < "460"
if (r == 0)
r = string.CompareOrdinal(x[i], y[i]);
if (r != 0)
return r;
}
// finally, the more items the bigger: "123-456-789" > "123-456"
return x.Length.CompareTo(y.Length);
});
// Let's have a look at the list after the sorting
Console.Write(string.Join(Environment.NewLine, list));
Outcome:
10-158-6
10-158-6
10-158-6-3434 // <- please, note that since 158 < 1528
10-1528-6 // <- "10-158-6-3434" is before "10-1528-6"
11-158-6
20-15438-6
90-158-6
123456789012345678901234567890
Those look like Version number. If a change from Dash to Dot are not a big change you can simply use C# Version
var list = new List<string>
{
"10-158-6",
"11-158-6",
"90-158-6",
"20-15438-6",
"10-158-6",
"10-158-6-3434",
"10-1528-6"
};
var versions = list.Select(x => new Version(x.Replace('-','.'))).ToList();
versions.Sort();
LiveDemo

Avoiding 'System.OutOfMemoryException' while iterating over a IEnumerable<IEnumerable<object>>

I have the following code to get the cheapest List of objects which satisfy the requiredNumbers criteria. This list of objects can have a length varying from 1 to maxLength, i.e. there can be a combination of 1 to maxLength of objects with repitition allowed. Right now, this this iterates over the whole list of combinations (IEnumerable of IEnumerable of OBJECT) fine till maxLength = 9 and breaks after that with a "System.OutOfMemoryException" at
t1.Concat(new OBJECT[] { t2 }
I tried another approach to solve this (mentioned in the code comments), but that seems to have its own demons. What I understand right now is , I'll have to somehow know the least priced combination of objects without iterating over the whole List of combination, which I can't seem to find feasible.
Could someone suggest any changes that let the maxLength be higher(much higher ideally), without hindering the performance. Any help is much appreciated. Please let me know if I am not clear.
private static int leastPrice = int.MaxValue;
private IEnumerable<IEnumerable<OBJECT>> CombinationOfObjects(IEnumerable<OBJECT> objects, int length)
{
if (length == 1)
return objects.Select(t => new OBJECT[] { t });
return CombinationOfObjects(objects, length - 1).SelectMany(t => objects, (t1, t2) => t1.Concat(new OBJECT[] { t2 }));
}
//Gets the least priced Valid combination out of all possible
public IEnumerable<OBJECT> GetValidCombination(IEnumerable<OBJECT> list, int maxLength, int[] matArray)
{
IEnumerable<IEnumerable<OBJECT>> tempList = null;
List<IEnumerable<OBJECT>> validList = new List<IEnumerable<OBJECT>>();
for (int i = 1; i <= maxLength; i++)
{
tempList = CombinationOfObjects(list, i);
tempList = from alist in tempList
orderby alist.Sum(x => x.Price)
select alist;
foreach (var lst in tempList)
{
//This check will not be required if the least priced value is returned as soon as found
int price = lst.Sum(c => c.Price);
if (price < leastPrice)
{
if (CheckMaterialSum(lst, matArray))
{
validList.Add(lst);
leastPrice = price;
break;
//return lst;
//returning lst as soon as valid combo is found is fastest
//Con being it also returns the least priced least item containing combo
//i.e. even if a 4 item combo is cheaper than the 2 item combo satisfying the need,
//it'll never even check for the 4 item combo
}
}
}
}
//This whole thing would go too if lst was returned earlier
foreach (IEnumerable<OBJECT> combination in validList)
{
int priceTotal = combination.Sum(combo => combo.Price);
if (priceTotal == leastPrice)
{
return combination;
}
}
return new List<OBJECT>();
}
//Checks if the given combination satisfies the requirement
private bool CheckMaterialSum(IEnumerable<OBJECT> combination, int[] matArray)
{
int[] sumMatProp = new int[matArray.Count()];
for (int i = 0; i < matArray.Count(); i++)
{
sumMatProp[i] = combination.Sum(combo => combo.Numbers[i]);
}
bool isCombinationValid = matArray.Zip(sumMatProp, (requirement, c) => c >= requirement).All(comboValid => comboValid);
return isCombinationValid;
}
static void Main(string[] args)
{
List<OBJECT> testList = new List<OBJECT>();
OBJECT object1 = new OBJECT();
object1.Name = "object1";
object1.Price = 2000;
object1.Numbers = new int[] { 2, 3, 4 };
testList.Add(object1);
OBJECT object2 = new OBJECT();
object2.Name = "object2";
object2.Price = 1900;
object2.Numbers = new int[] { 3, 2, 4 };
testList.Add(object1);
OBJECT object3 = new OBJECT();
object3.Name = "object3";
object3.Price = 1600;
object3.Numbers = new int[] { 4, 3, 2 };
testList.Add(object1);
int requiredNumbers = new int[]{10,10,10};
int maxLength = 9;//This is the max length possible, OutOf Mememory exception after this
IEnumerable<OBJECT> resultCombination = GetValidCombination(testList, maxLength, requiredNumbers);
}
EDIT
Requirement:
I have a number of objects having several properties, namely, Price, Name , and Materials. Now, I need to find such a combination of these objects that the sum of all materials in a combination satisfies the user input qty of materials. Also, the combination needs to be of least price possible.
There is a constraint of maxLength and it sets the maximum total number of objects that can be in a combination, i.e. for a maxLength = 8, the combination may contain anywhere from 1 to 8 objects.
Approaches tried:
1.
-I find all combinations of objects possible (valid + invalid)
-Iterate over them to find the least priced combination. This goes out of memory while iterating.
2.
-I find all combinations possible (valid + invalid)
-Apply a validity check (i.e if it fulfills the user requirement)
-Add only valid combinations in a List of List
-Iterate over this valid List of lists to find the cheapest list and return that. Also goes out of memory
3.
-I find combinations in increasing order of objects (i.e. first all combinations having 1 object, then 2 then so on...)
-Sort the combinations according to price
-Apply validity check and return the first valid combination
-Now this works fine performance wise, but does not always return the cheapest possible combination.
If I could somehow get the optimal solution without iterating over the whole list , that would solve it. But, all of the things that I've tried either have to iterate over all combinations or simply do not result in the optimal solution.
Any help regarding even some other approach that I can't seem to think of is most welcome.

Get all possible distinct triples using LINQ

I have a List contains these values: {1, 2, 3, 4, 5, 6, 7}. And I want to be able to retrieve unique combination of three. The result should be like this:
{1,2,3}
{1,2,4}
{1,2,5}
{1,2,6}
{1,2,7}
{2,3,4}
{2,3,5}
{2,3,6}
{2,3,7}
{3,4,5}
{3,4,6}
{3,4,7}
{3,4,1}
{4,5,6}
{4,5,7}
{4,5,1}
{4,5,2}
{5,6,7}
{5,6,1}
{5,6,2}
{5,6,3}
I already have 2 for loops that able to do this:
for (int first = 0; first < test.Count - 2; first++)
{
int second = first + 1;
for (int offset = 1; offset < test.Count; offset++)
{
int third = (second + offset)%test.Count;
if(Math.Abs(first - third) < 2)
continue;
List<int> temp = new List<int>();
temp .Add(test[first]);
temp .Add(test[second]);
temp .Add(test[third]);
result.Add(temp );
}
}
But since I'm learning LINQ, I wonder if there is a smarter way to do this?
UPDATE: I used this question as the subject of a series of articles starting here; I'll go through two slightly different algorithms in that series. Thanks for the great question!
The two solutions posted so far are correct but inefficient for the cases where the numbers get large. The solutions posted so far use the algorithm: first enumerate all the possibilities:
{1, 1, 1 }
{1, 1, 2 },
{1, 1, 3 },
...
{7, 7, 7}
And while doing so, filter out any where the second is not larger than the first, and the third is not larger than the second. This performs 7 x 7 x 7 filtering operations, which is not that many, but if you were trying to get, say, permutations of ten elements from thirty, that's 30 x 30 x 30 x 30 x 30 x 30 x 30 x 30 x 30 x 30, which is rather a lot. You can do better than that.
I would solve this problem as follows. First, produce a data structure which is an efficient immutable set. Let me be very clear what an immutable set is, because you are likely not familiar with them. You normally think of a set as something you add items and remove items from. An immutable set has an Add operation but it does not change the set; it gives you back a new set which has the added item. The same for removal.
Here is an implementation of an immutable set where the elements are integers from 0 to 31:
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System;
// A super-cheap immutable set of integers from 0 to 31 ;
// just a convenient wrapper around bit operations on an int.
internal struct BitSet : IEnumerable<int>
{
public static BitSet Empty { get { return default(BitSet); } }
private readonly int bits;
private BitSet(int bits) { this.bits = bits; }
public bool Contains(int item)
{
Debug.Assert(0 <= item && item <= 31);
return (bits & (1 << item)) != 0;
}
public BitSet Add(int item)
{
Debug.Assert(0 <= item && item <= 31);
return new BitSet(this.bits | (1 << item));
}
public BitSet Remove(int item)
{
Debug.Assert(0 <= item && item <= 31);
return new BitSet(this.bits & ~(1 << item));
}
IEnumerator IEnumerable.GetEnumerator() { return this.GetEnumerator(); }
public IEnumerator<int> GetEnumerator()
{
for(int item = 0; item < 32; ++item)
if (this.Contains(item))
yield return item;
}
public override string ToString()
{
return string.Join(",", this);
}
}
Read this code carefully to understand how it works. Again, always remember that adding an element to this set does not change the set. It produces a new set that has the added item.
OK, now that we've got that, let's consider a more efficient algorithm for producing your permutations.
We will solve the problem recursively. A recursive solution always has the same structure:
Can we solve a trivial problem? If so, solve it.
If not, break the problem down into a number of smaller problems and solve each one.
Let's start with the trivial problems.
Suppose you have a set and you wish to choose zero items from it. The answer is clear: there is only one possible permutation with zero elements, and that is the empty set.
Suppose you have a set with n elements in it and you want to choose more than n elements. Clearly there is no solution, not even the empty set.
We have now taken care of the cases where the set is empty or the number of elements chosen is more than the number of elements total, so we must be choosing at least one thing from a set that has at least one thing.
Of the possible permutations, some of them have the first element in them and some of them do not. Find all the ones that have the first element in them and yield them. We do this by recursing to choose one fewer elements on the set that is missing the first element.
The ones that do not have the first element in them we find by enumerating the permutations of the set without the first element.
static class Extensions
{
public static IEnumerable<BitSet> Choose(this BitSet b, int choose)
{
if (choose < 0) throw new InvalidOperationException();
if (choose == 0)
{
// Choosing zero elements from any set gives the empty set.
yield return BitSet.Empty;
}
else if (b.Count() >= choose)
{
// We are choosing at least one element from a set that has
// a first element. Get the first element, and the set
// lacking the first element.
int first = b.First();
BitSet rest = b.Remove(first);
// These are the permutations that contain the first element:
foreach(BitSet r in rest.Choose(choose-1))
yield return r.Add(first);
// These are the permutations that do not contain the first element:
foreach(BitSet r in rest.Choose(choose))
yield return r;
}
}
}
Now we can ask the question that you need the answer to:
class Program
{
static void Main()
{
BitSet b = BitSet.Empty.Add(1).Add(2).Add(3).Add(4).Add(5).Add(6).Add(7);
foreach(BitSet result in b.Choose(3))
Console.WriteLine(result);
}
}
And we're done. We have generated only as many sequences as we actually need. (Though we have done a lot of set operations to get there, but set operations are cheap.) The point here is that understanding how this algorithm works is extremely instructive. Recursive programming on immutable structures is a powerful tool that many professional programmers do not have in their toolbox.
You can do it like this:
var data = Enumerable.Range(1, 7);
var r = from a in data
from b in data
from c in data
where a < b && b < c
select new {a, b, c};
foreach (var x in r) {
Console.WriteLine("{0} {1} {2}", x.a, x.b, x.c);
}
Demo.
Edit: Thanks Eric Lippert for simplifying the answer!
var ints = new int[] { 1, 2, 3, 4, 5, 6, 7 };
var permutations = ints.SelectMany(a => ints.Where(b => (b > a)).
SelectMany(b => ints.Where(c => (c > b)).
Select(c => new { a = a, b = b, c = c })));

Transform sequence in Linq while being aware of each Select/SelectMany result

Here's my problem. I have one specific list, which I'll present as a int[] for simplicity's sake.
int[] a = {1,2,3,4,5};
Suppose I need to transform each item on this list, but depending on the situation, I may return an int or an array of ints.
As an example, suppose I need to return {v} if the value is odd, and {v,v+1} if the value is even. I've done this:
int[] b = a.SelectMany(v => v % 2 == 0 ? new int[] { v, v+1 } : new int[] { v })
.ToArray();
So if I run this, I'll get the expected response:
{1,2,3,3,4,5,5}
See that I have repeating numbers, right? 3 and 5. I don't want those repeating numbers. Now, you may tell me that I can just call .Distinct() after processing the array.
This is the problem. The SelectMany clause is fairly complex (I just made up a simpler example), and I definitely don't want to process 3 if it's already present in the list.
I could check if 3 is present in the original list. But if I got 3 in the SelectMany clause, I don't want to get it again. For instance, if I had this list:
int[] a = {1,2,3,4,5,2};
I would get this:
{1,2,3,3,4,5,5,2,3}
Thus returning v (my original value) and v+1 again at the end. Just so you can understand it better v+1 represents some processing I want to avoid.
Summarizing, this is what I want:
I have a list of objects. (Check)
I need to filter them, and depending on the result, I may need to return more than one object. (Check, used SelectMany)
I need them to be distinct, but I can't do that at the end of the process. I should be able to return just {v} if {v+1} already exists. (Clueless...)
One thing I thought about is writing a custom SelectMany which may suit my needs, but I want to be sure there's no built-in way to do this.
EDIT: I believe I may have mislead you guys with my example. I know how to figure out if v+1 is in a list. To be clear, I have one object which has 2 int properties, Id and IdParent. I need to "yield return" all the objects and their parents. But I just have the ParentId, which comes from the objects themselves. I'm able to know if v+1 is in the list because I can check if any object there has the same Id as the ParentId I'm checking.
ANSWER: I ended up using Aggregate, which can be used to do exactly what I'm looking for.
Does this simple loop with the HashSet<int> help?
int[] a = {1,2,3,4,5,2};
var aLookupList = new HashSet<int>();
foreach (int i in a)
{
bool isEven = i % 2 == 0;
if (isEven)
{
aLookupList.Add(i);
aLookupList.Add(i + 1);
}
else
{
aLookupList.Add(i);
}
}
var result = aLookupList.ToArray();
What about this using Aggregate method. You won't be processing numbers that are already in the list, wheather they were in the original list or as a result of applying (v + 1)
int[] v = { 1, 2, 3, 4, 5, 2 };
var result = v.Aggregate(new List<int>(),
(acc, next) =>
{
if (!acc.Contains(next))
return (next % 2 == 0) ? acc.Concat(new int[] { next, next + 1 }).ToList()
: acc.Concat(new int[] { next }).ToList();
else
return acc;
}).ToArray();
var existing = new HashSet<int>(a);
var result = existing
.Where(v => v % 2 == 0 && !existing.Contains(v + 1))
.Select(v => v + 1)
.Concat(existing)
.ToArray();
As I understand you have this input:
int[] a = {1,2,3,4,5};
And the output should also be {1,2,3,4,5} because you don't want duplicated numbers as you describe.
Because you use an array as input, you can try this code:
var output = a.SelectMany((x,i)=> x % 2 == 0 ? new []{x,x+1} :
i > 0 && a[i-1]==x-1 ? new int[]{} : new []{x});
//if the input is {1,2,4,5}
//The output is also {1,2,3,4,5}

What is the fastest non-LINQ algorithm to 'pair up' matching items from multiple separate lists?

IMPORTANT NOTE
To the people who flagged this as a duplicate, please understand we do NOT want a LINQ-based solution. Our real-world example has several original lists in the tens-of-thousands range and LINQ-based solutions are not performant enough for our needs since they have to walk the lists several times to perform their function, expanding with each new source list.
That is why we are specifically looking for a non-LINQ algorithm, such as the one suggested in this answer below where they walk all lists simultaneously, and only once, via enumerators. That seems to be the best so far, but I am wondering if there are others.
Now back to the question...
For the sake of explaining our issue, consider this hypothetical problem:
I have multiple lists, but to keep this example simple, let's limit it to two, ListA and ListB, both of which are of type List<int>. Their data is as follows:
List A List B
1 2
2 3
4 4
5 6
6 8
8 9
9 10
...however the real lists can have tens of thousands of rows.
We next have a class called ListPairing that's simply defined as follows:
public class ListPairing
{
public int? ASide{ get; set; }
public int? BSide{ get; set; }
}
where each 'side' parameter really represents one of the lists. (i.e. if there were four lists, it would also have a CSide and a DSide.)
We are trying to do is construct a List<ListPairing> with the data initialized as follows:
A Side B Side
1 -
2 2
- 3
4 4
5 -
6 6
8 8
9 9
- 10
Again, note there is no row with '7'
As you can see, the results look like a full outer join. However, please see the update below.
Now to get things started, we can simply do this...
var finalList = ListA.Select(valA => new ListPairing(){ ASide = valA} );
Which yields...
A Side B Side
1 -
2 -
4 -
5 -
6 -
8 -
9 -
and now we want to go back-fill the values from List B. This requires checking first if there is an already existing ListPairing with ASide that matches BSide and if so, setting the BSide.
If there is no existing ListPairing with a matching ASide, a new ListPairing is instantiated with only the BSide set (ASide is blank.)
However, I get the feeling that's not the most efficient way to do this considering all of the required 'FindFirst' calls it would take. (These lists can be tens of thousands of items long.)
However, taking a union of those lists once up front yields the following values...
1, 2, 3, 4, 5, 6, 8, 9, 10 (Note there is no #7)
My thinking was to somehow use that ordered union of the values, then 'walking' both lists simultaneously, building up ListPairings as needed. That eliminates repeated calls to FindFirst, but I'm wondering if that's the most efficient way to do this.
Thoughts?
Update
People have suggested this is a duplicate of getting a full outer join using LINQ because the results are the same...
I am not after a LINQ full outer join. I'm after a performant algorithm.
As such, I have updated the question.
The reason I bring this up is the LINQ needed to perform that functionality is much too slow for our needs. In our model, there are actually four lists, and each can be in the tens of thousands of rows. That's why I suggested the 'Union' approach of the IDs at the very end to get the list of unique 'keys' to walk through, but I think the posted answer on doing the same but with the enumerators is an even better approach as you don't need the list of IDs up front. This would yield a single pass through all items in the lists simultaneously which would easily out-perform the LINQ-based approach.
This didn't turn out as neat as I'd hoped, but if both input lists are sorted then you can just walk through them together comparing the head elements of each one: if they're equal then you have a pair, else emit the smallest one on its own and advance that list.
public static IEnumerable<ListPairing> PairUpLists(IEnumerable<int> sortedAList,
IEnumerable<int> sortedBList)
{
// Should wrap these two in using() per Servy's comment with braces around
// the rest of the method.
var aEnum = sortedAList.GetEnumerator();
var bEnum = sortedBList.GetEnumerator();
bool haveA = aEnum.MoveNext();
bool haveB = bEnum.MoveNext();
while (haveA && haveB)
{
// We still have values left on both lists.
int comparison = aEnum.Current.CompareTo(bEnum.Current);
if (comparison < 0)
{
// The heads of the two remaining sequences do not match and A's is
// lower. Generate a partial pair with the head of A and advance the
// enumerator.
yield return new ListPairing() {ASide = aEnum.Current};
haveA = aEnum.MoveNext();
}
else if (comparison == 0)
{
// The heads of the two sequences match. Generate a pair.
yield return new ListPairing() {
ASide = aEnum.Current,
BSide = bEnum.Current
};
// Advance both enumerators
haveA = aEnum.MoveNext();
haveB = bEnum.MoveNext();
}
else
{
// No match and B is the lowest. Generate a partial pair with B.
yield return new ListPairing() {BSide = bEnum.Current};
// and advance the enumerator
haveB = bEnum.MoveNext();
}
}
if (haveA)
{
// We still have elements on list A but list B is exhausted.
do
{
// Generate a partial pair for all remaining A elements.
yield return new ListPairing() { ASide = aEnum.Current };
} while (aEnum.MoveNext());
}
else if (haveB)
{
// List A is exhausted but we still have elements on list B.
do
{
// Generate a partial pair for all remaining B elements.
yield return new ListPairing() { BSide = bEnum.Current };
} while (bEnum.MoveNext());
}
}
var list1 = new List<int?>(){1,2,4,5,6,8,9};
var list2 = new List<int?>(){2,3,4,6,8,9,10};
var left = from i in list1
join k in list2 on i equals k
into temp
from k in temp.DefaultIfEmpty()
select new {a = i, b = (i == k) ? k : (int?)null};
var right = from k in list2
join i in list1 on k equals i
into temp
from i in temp.DefaultIfEmpty()
select new {a = (i == k) ? i : (int?)i , b = k};
var result = left.Union(right);
If you need the ordering to be same as your example, then you will need to provide an index and order by that (then remove duplicates)
var result = left.Select((o,i) => new {o.a, o.b, i}).Union(right.Select((o, i) => new {o.a, o.b, i})).OrderBy( o => o.i);
result.Select( o => new {o.a, o.b}).Distinct();

Categories

Resources