Merge intersecting items in a list of lists

Merge intersecting items in a list of lists - c#

I have list of lists where i would like to merge all lists which contain identical values and make a new list out of the merged lists using linq. Here's and example:
var animalGroups = new List<List<Animal>>{
new List<Animal>{lizard,cat,cow,dog},
new List<Animal>{horse, chicken, pig, turkey},
new List<Animal>{ferret,duck,cat,parrot},
new List<Animal>{chicken,sheep,horse,rabbit}
};
The desired output would be a new List<List<animal>> containing the following List<Animal>:
{lizard, cat, cow, dog, ferret, duck, parrot}
{horse, chicken, pig, turkey, sheep, rabbit}
I'm rather new to linq and i got stuck at grouping the intersected lists without creating duplicates.

Here is the possible output with list of strings
var animalGroups = new List<List<string>>
{
new List<string> {"lizard", "cat", "cow", "dog"},
new List<string> {"horse", "chicken", "pig", "turkey"},
new List<string> {"ferret", "duck", "cat", "parrot"},
new List<string> {"chicken", "sheep", "horse", "rabbit"}
};
List<List<string>> mergedList = new List<List<string>>();
for (int i = 0; i < animalGroups.Count; i++)
{
for (int j = i+1; j < animalGroups.Count; j++)
{
if (animalGroups[i].Intersect(animalGroups[j]).Any())
{
mergedList.Add(animalGroups[i].Concat(animalGroups[j]).Distinct().ToList());
}
}
}

First, remember to override Equals and GetHahCode and/or implement IEquatable<Animal> in your Anymial class meaningfully(f.e. by comparing the Name).
List<IEnumerable<Animal>> mergedLists = animalGroups.MergeIntersectingLists().ToList();
Following extension method used which works with any type:
public static IEnumerable<IEnumerable<T>> MergeIntersectingLists<T>(this IEnumerable<IEnumerable<T>> itemLists, IEqualityComparer<T> comparer = null)
{
if (comparer == null) comparer = EqualityComparer<T>.Default;
var itemListDict = new Dictionary<T, HashSet<T>>(comparer);
foreach (IEnumerable<T> sequence in itemLists)
{
IList<T> list = sequence as IList<T> ?? sequence.ToList();
HashSet<T> itemStorage = null;
list.FirstOrDefault(i => itemListDict.TryGetValue(i, out itemStorage));
// FirstOrDefault will initialize the itemStorage because its an out-parameter
bool partOfListIsContainedInOther = itemStorage != null;
if (partOfListIsContainedInOther)
{
// add this list to the other storage (a HashSet that removes duplicates)
foreach (T item in list)
itemStorage.Add(item);
}
else
{
itemStorage = new HashSet<T>(list, comparer);
// each items needs to be added to the dictionary, all have the same storage
foreach (T item in itemStorage)
itemListDict.Add(item, itemStorage); // same storage for all
}
}
// Distinct removes duplicate HashSets because of reference equality
// needed because item was the key and it's storage the value
// and those HashSets are the same reference
return itemListDict.Values.Distinct();
}

Your question is vague one; in case you want to combine 0, 2, 4, ... 2n as well as 1, 3, 5, ... 2n - 1 lists and you are looking for Linq solution:
// I don't have Animal class, that's why I've put string
// Be sure that Animal implements Equals as well as GetHashCode methods
var animalGroups = new List<List<string>> {
new List<string> {"lizard", "cat", "cow", "dog"},
new List<string> {"horse", "chicken", "pig", "turkey"},
new List<string> {"ferret", "duck", "cat", "parrot"},
new List<string> {"chicken", "sheep", "horse", "rabbit"}
};
var result = animalGroups
.Select((list, index) => new {
list = list,
index = index, })
.GroupBy(item => item.index % 2, // grouping 0, 2, ... 2n as well as 1, 3,... 2n - 1
item => item.list)
.Select(chunk => chunk
.SelectMany(c => c)
.Distinct()
.ToList())
.ToList();
Let's visualize the result:
string test = string.Join(Environment.NewLine, result
.Select(list => string.Join(", ", list)));
Console.WritelLine(test);
Outcome
lizard, cat, cow, dog, ferret, duck, parrot
horse, chicken, pig, turkey, sheep, rabbit

Related

c# combine x amount of items from 2 lists into a 3'rd list

I have two List<T>
List<T> listA = new List<T>();
List<T> listB = new List<T>();
each list contains a certain amount of items.
How can I can I create List<List<T>> that will contain 2 items from ListA and 4 items from ListB?
Example:
public class Item
{
public string Category { get; set; }
}
List<Item> listA = new List<Item>();
//repeat the code bellow until listA will contain 8 items of category Apple (amount 8 is just an example)
listA.Add(new Item()
{
Category = "Apple"
});
List<Item> listB = new List<Item>();
//repeat the code bellow until listB will contain 16 items of category Apple (amount 16 is just an example)
listB.Add(new Item()
{
Category = "Pear"
});
So ListA contains 8 items of type Apple and ListB contains 16 items of type pear. I want to create individual List<Item> and each list to contains 2 items from ListA and 4 items from ListB so I will have 4 List<Item>.
List<Item> list1 // contains 2 items from ListA and 4 items from ListB
List<Item> list2 // contains 2 items from ListA and 4 items from ListB
List<Item> list3 // contains 2 items from ListA and 4 items from ListB
List<Item> list4 // contains 2 items from ListA and 4 items from ListB
now the 4 lists above can be added into a List
List<List<Item>> items = new List<List<Item>>();
items.Add(list1);
items.Add(list2);
items.Add(list3);
items.Add(list4);

The core problem here is the segmentation (making the groups of 2 and 4 items) for which you will need MoreLinq. After that it's a simple .Zip() of lists.
using System.Linq;
using MoreLinq;
var segA = listA.Segment((x, i) => i % 2 == 0);
var segB = listB.Segment((x, i) => i % 4 == 0);
var result = segA.Zip(segB, (a, b) => a.Concat(b).ToList()).ToList();

To the original question,
How can I can I create List<List<T>> that will contain 2 items from ListA and 4 items from ListB?
var result = new List<List<T>> { listA.Take(2).ToList(), listB.Take(4).ToList() };

you would need something like this based on what i understood from your question.
As for the "condition" mentioned below, that you will have to figure out based on the cases you have.
one case could be: if it so happens that after making pairs, if 1 apple remains and 3 pears remain, what should be done.
public class AB()
{
public List<T> AList { get; set; }
public List<T> BList { get; set; }
}
//code in the function
List<AB> lstAB = new List<AB>();
int takeA = 2;
int takeB = 4;
int skipA = 0;
int skipB = 0;
while(condition)
{
lstAb.Add(new AB()
{
AList = listA.Skip(skipA).Take(takeA).ToList(),
BList = listB.Skip(skipB).Take(takeB).ToList()
});
skipA += takeA;
skipB += takeB;
};

For what you want to achieve there is no LINQ 'out of the box' method.
You could use 3rd party libraries.
Or write your own extensions method:
public static class Extensions
{
public static List<List<TEntity>> ToFixedSizeGroups<TEntity>(this IEnumerable<TEntity> list1, IEnumerable<TEntity> list2, int take1, int take2)
{
// check if the collection is a list already
var list1Enumerated = list1 as IList<TEntity> ?? list1.ToList();
var list2Enumerated = list2 as IList<TEntity> ?? list2.ToList();
// If we want to use a single for loop we need to know max-length
var longerList = list1Enumerated.Count > list2Enumerated.Count ? list1Enumerated : list2Enumerated;
var grouppedList1 = Enumerable.Range(0, list1Enumerated.Count / take1).Select(x => new List<TEntity>()).ToList();
var grouppedList2 = Enumerable.Range(0, list2Enumerated.Count / take2).Select(x => new List<TEntity>()).ToList();
for (var i = 0; i < longerList.Count; i++)
{
if (i < list1Enumerated.Count && i / take1 < grouppedList1.Count)
{
grouppedList1[i / take1].Add(list1Enumerated[i]);
}
if (i < list2Enumerated.Count && i / take2 < grouppedList2.Count)
{
grouppedList2[i / take2].Add(list2Enumerated[i]);
}
}
return grouppedList1.Where(x => x.Count == take1).Zip(grouppedList2.Where(x => x.Count == take2), (x, y) => x.Concat(y).ToList()).ToList();
}
}
Example of use:
List<string> a = new List<string> {"one", "two", "three", "one"};
List<string> b = new List<string> { "four", "five", "four", "five" };
List<List<string>> groups = a.ToFixedSizeGroups(b, 3, 2);

Get duplicate values from dictionary

I have dictionary and I need get duplicate values.
For example:
Dictionary<int, List<string>> dictionary = new Dictionary<int, List<string>>();
List<string> list1 = new List<string> { "John", "Smith" };
List<string> list2 = new List<string> { "John", "Smith" };
List<string> list3 = new List<string> { "Mike", "Johnson" };
dictionary.Add(1, list1);
dictionary.Add(2, list2);
dictionary.Add(3, list3);
I need find all duplicate from dictionary and return max keys(collection of key) of each duplicate values. From my test dictionary I need return list with only one key = 2
Maybe I chose the wrong data structure. I would like to receive optimal algorithm

With your current structure, you're in a bit of trouble because you don't necessarily have an easy way to compare two List<string> to see if they are equal.
One way to work around this is to create a custom List<string> comparer that implements IEqualityComparer<List<string>>. However, since you have a list of strings, we also need to reorder both to ensure that we are comparing each value in the correct order. This affects the cost of your algorithm. On the other hand, if you are happy with the order of the values inside of the lists, that works just fine as well and you can avoid that cost.
public class StringListComparer : IEqualityComparer<List<string>>
{
public bool Equals(List<string> x, List<string> y)
{
return CompareLists(x, y);
}
public int GetHashCode(List<string> obj)
{
return base.GetHashCode();
}
private static bool CompareLists(List<string> x, List<string> y)
{
if (x.Count != y.Count)
return false;
// we HAVE to ensure that lists are in same order
// for a proper comparison
x = x.OrderBy(v => v).ToList();
y = y.OrderBy(v => v).ToList();
for (var i = 0; i < x.Count(); i++)
{
if (x[i] != y[i])
return false;
}
return true;
}
}
Once we have our comparer, we can use it to pull out keys from subsequent duplicates, leaving the first key (per your requirement).
public List<int> GetDuplicateKeys(Dictionary<int, List<string>> dictionary)
{
return dictionary
.OrderBy (x => x.Key)
.GroupBy(x => x.Value, new StringListComparer())
.Where (x => x.Count () > 1)
.Aggregate (
new List<int>(),
(destination, dict) =>
{
var first = dict.FirstOrDefault();
foreach (var kvp in dict)
{
if (!kvp.Equals(first))
destination.Add(kvp.Key);
}
return destination;
}
).ToList();
}
The following test outputs keys 2 and 4.
Dictionary<int, List<string>> dictionary = new Dictionary<int, List<string>>();
dictionary.Add(1, new List<string> { "John", "Smith" });
dictionary.Add(2, new List<string> { "John", "Smith" });
dictionary.Add(3, new List<string> { "Mike", "Johnson"});
dictionary.Add(4, new List<string> { "John", "Smith" });
var result = GetDuplicateKeys(dictionary);

You could create a list of KeyValuePair<int, List<string>>, that contains the ordered lists, with the outer list sorted. Then you could find duplicates very quickly. You'd need a list comparer that can compare ordered lists.
class MyListComparer: Comparer<List<string>>
{
public override int Compare(List<string> x, List<string> y)
{
for (var ix = 0; ix < x.Count && ix < y.Count; ++ix)
{
var rslt = x[ix].CompareTo(y[ix]);
if (rslt != 0)
{
return rslt;
}
}
// exhausted one of the lists.
// Compare the lengths.
return x.Count.CompareTo(y.Count);
}
}
var comparer = new MyListComparer();
var sortedList = dictionary.Select(kvp =>
new KeyValuePair<int, List<string>>(kvp.Key, kvp.Value.OrderBy(v => v))
.OrderBy(kvp => kvp.Value, comparer)
.ThenBy(kvp => kvp.Key);
Note the ThenBy, which ensures that if two lists are equal, the one with the smaller key will appear first. This is necessary because, although OrderBy does a stable sort, there's no guarantee that enumerating the dictionary returned items in order by key.
// the lists are now sorted by value. So `"Smith, John"` will appear before `"Smith, William"`.
// We can go through the list sequentially to output duplicates.
var previousList = new List<string>();
foreach (var kvp in sortedList)
{
if (kvp.Value.SequenceEqual(previousList))
{
// this list is a duplicate
// Lookup the list using the key.
var dup = dictionary[kvp.Key];
// Do whatever you need to do with the dup
}
else
{
previousList = kvp.Value;
}
}
This sorts each list only once. It does use more memory, because it duplicates the dictionary in that list of KeyValuePair<int, List<string>>, but for larger data sets it should be much faster than sorting each list multiple times and comparing it against every other list.
Caveats: The code above assumes that none of the lists are null, and none of them are empty. If a list in the dictionary can be null or empty, then you'll have to add some special case code. But the general approach would be the same.

C# Ordering 2 lists randomly

I have been trying to figure out how to randomly order two lists the same eg.
List<string> list = new List<string>();
list.Add("RedHat");
list.Add("BlueHat");
list.Add("YellowHat");
List<image> list2 = new List<image>();
list.Add(Properties.Resources.RedHat);
list.Add(Properties.Resources.BlueHat);
list.Add(Properties.Resources.YellowHat);
now if i wanted to order these so that redhat and the redhat image stay aligned how may i do this?And is there a way to combine these lists and then shuffle using a dictionary or keyvalue pair or something along those lines?

Wrap the two in an object:
class WrapperObject {
public string Name { get; set; }
public object Resource { get; set; }
}
Add them to a list:
var list = new List<WrapperObject>();
list.Add(new WrapperObject() {
Name = "RedHat",
Resource = Properties.Resources.RedHat
});
..randomize:
var rnd = new Random();
list = list.OrderBy(x => rnd.Next(50)).ToList();

Any specific reason why you want them in two lists, you could just create a list of keyvaluepairs like this:
var list = new List<KeyValuePair<string, image>> ();
list.Add(new KeyValuePair<string, image>("RedHat", (Properties.Resources.RedHat)));
list.Add(new KeyValuePair<string, image>("BlueHat", (Properties.Resources.BlueHat)));
list.Add(new KeyValuePair<string, image>("YellowHat", (Properties.Resources.YellowHat)));

You could store the data in a Tuple<,> but if you had more than 2 elements its worth just creating an explicit class to store the data.
Tuple example:
List<Tuple<string, image>> list = new List<Tuple<string, image>>();
list.Add(new Tuple<string,image>("RedHat", Properties.Resources.RedHat));
// etc...

LINQ-fu version:
var rng = new Random();
var res = Enumerable.Zip(list, list2, (e1, e2) => new { e1, e2 })
.OrderBy(x => rng.Next())
.Aggregate(new { list1 = new List<string>(), list2 = new List<image>() },
(lists, next) =>
{
lists.list1.Add(next.e1);
lists.list2.Add(next.e2);
return lists;
});
list = res.list1;
list2 = res.list2;

The following code should do what you want:
var list1 = new List<string>
{
"RedHat",
"BlueHat",
"YellowHat"
};
var list2 = new List<int>
{
1,
2,
3
};
var combined = list1.Zip(list2, (a, b) => new { a, b }).Shuffle(new Random()).ToList();
list1 = combined.Select(i => i.a).ToList();
list2 = combined.Select(i => i.b).ToList();
You'll need the following extension method:
public static class ShuffleExtension
{
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
{
T[] elements = source.ToArray();
for (int i = elements.Length - 1; i >= 0; i--)
{
int swapIndex = rng.Next(i + 1);
yield return elements[swapIndex];
elements[swapIndex] = elements[i];
}
}
}

First put the corresponding elements together, then apply random order:
var rnd = new Random();
var ordered = list.Zip(list2, Tuple.Create).OrderBy(el => rnd.Next()).ToArray();
You can easily extract back the individual lists, if needed:
var ordered_list = ordered.Select(tuple => tuple.Item1).ToList();
var ordered_list2 = ordered.Select(tuple => tuple.Item2).ToList();

Combining lists in linq

in linq, is it possible to combine many lists (of the same type), such that two lists,
list 1 = {a,b,c} and list 2 = {x,y,z}
turns into {[1,a] , [1,b] , [1,c] , [2,x] , [2,y] , [2,z] }
where [] represents a pair containing a "list identifier"
The problem is from having decks of arbitrary cards, where each deck is a list in a collection of lists.
I'm trying to create a query such that I can select only cards in a certain deck, or cards similar to 2 or more decks.
This is probably a duplicate question, but I don't know how to search for the question further then I already have.

List<List<int>> lists;
var combined = lists.Select((l, idx) => new { List = l, Idx = idx })
.SelectMany(p => p.List.Select(i => Tuple.Create(p.Idx + 1, i)));

var list1 = new List<string>() {a,b,c};
var list2 = new List<string>() {x,y,z};
var combined = list1.Select(x => new { id = 1, v = x }).Concat(list2.Select(x => new { id = 2, v = x }));

Normally I'd suggest Enumerable.Zip for combining multiple lists, however you seem to actually want to concatenate multiple lists with a list counter.
public IEnumerable<Tuple<int,T>> Combine<T>(params IEnumerable<T>[] lists) {
return lists.Select((x,i) => x.Select(y => Tuple.Create(i+1,y))).SelectMany (l =>l);
}
UPDATE
Completely missed that SelectMany has the index option so the above code can be written as
public IEnumerable<Tuple<int,T>> Combine<T>(params IEnumerable<T>[] lists) {
return lists.SelectMany((x,i) => x.Select(y => Tuple.Create(i+1,y)));
}
Then you can do
var list1 = new List<string> { "a", "b", "c" };
var list2 = new List<string> { "x", "y", "z" };
var combined = Combine(list1,list2);
Combined will be enumerable of tuples, with Item1 being the list index identifier (starting at 1) and Item2 being the value.
This method will handle multiple lists so you could just as easily call it with:
var list3 = new List<string> { "f", "g" };
var combined = Combine(list1,list2,list3);

You can merge the lists like:
var first = new List<string> {"a","b","c"};
var second = new List<string> {"x","y","z"};
var merged = first.Select(item => new { ListIndex = 1, Value = item}).ToList();
merged.AddRange(second.Select(item => new { ListIndex = 2, Value = item});
//or use concat
var merged = first.Select(item => new { ListIndex = 1, Value = item});
.Concat(second.Select(item => new { ListIndex = 2, Value = item});
Alternatively if you have the sources in something like:
List<List<string>> lists = new List<List<string>>
{
new List<string> {"a","b","c"},
new List<string> {"x","y","z"}
};
you can do:
var merged = lists.SelectMany((item, index) =>
item.Select(s => new { ListIndex = index, Value = s}));
Note that this will produce a 0-based list, so if you really need a 1-base list, just do ListIndex = index +1.
Also, if you will use this a lot, I would create it as an specific entity, something like
struct ListIdentValue
{
public int ListIndex {get; private set;}
public string Value {get; private set;}
public ListIdentValue(int listIndex, string value) {...}
}

Try using Concat
new[] {'a','b','c'}
.Select(v=>new Tuple<int,char>(1, v))
.Concat(
new[] {'x','y','z'}.Select(v=>new Tuple<int,char>(2, v))
)

string[] a = { "a", "b", "c" };
string[] b = { "x", "z", "y" };
var t =
(
from ai in a
select new { listNo = 1, Item = ai }
).Union
(
from bi in b
select new { listNo = 2, Item = bi }
);
or
var t =
(
from ai in a
select new object[] { 1, ai }
).Union
(
from bi in b
select new object[] { 2, bi }
);

Operation on overlapping elements on IEnumerable by LINQ

Let us consider an IEnumerable and the algorithm that takes pairs of overlapping indexes e.g. {0, 1}, {1, 2}, {2, 3} etc. end creates a new collection based on values of these indexes e.g. {collection[0], collection[1] => result[0]}, {collection[1], collection[2] => result[1]} and so on. Below is an example of straight implementation:
IEnumerable<string> collection = new string[100];
var array = collection.ToArray();
var results = array.Skip(1).Select((e, i) => e - array[i]);
How to achieve the goal in better manner?

var array = new string[] { "one", "two", "three" };
var result = Enumerable.Range(1, array.Length - 1)
.Select(i => new[] { array[i - 1], array[i] });
Here is #TrustMe solution with arrays instead of tuples (just to show you sample, you should not accept my answer):
IEnumerable<string> collection = new string[] { "one", "two", "three" };
var result = collection.Zip(collection.Skip(1), (x,y) => new [] { x, y });
But keep in mind, that collection will be enumerated two times if you do not use access by index (with array or list).
UPDATE Here is an extension method, which will work with collection and will enumerate sequence only once:
public static class Extensions
{
public static IEnumerable<T[]> GetOverlappingPairs<T>(
this IEnumerable<T> source)
{
var enumerator = source.GetEnumerator();
enumerator.MoveNext();
var first = enumerator.Current;
while (enumerator.MoveNext())
{
var second = enumerator.Current;
yield return new T[] { first, second };
first = second;
}
}
}
Usage:
var result = collection.GetOverlappingPairs();

And here's another one:
var ints = Enumerable.Range(0, 10);
var paired = ints.Zip(ints.Skip(1), Tuple.Create);
That way you'll get the pairs {0,1}, {1,2} ...
I assume that's what you're asking for, because your code sample is a tad different than what you described... :)

var result = Enumerable.Range(1, arrayCollection.Length - 1)
.Select(i => new[] {arrayCollection[i - 1], arrayCollection[i]});
If arrayCollection is IEnumerable
var result = Enumerable.Range(1, arrayCollection.Count() - 1)
.Select(i => new[] {
arrayCollection.ElementAt(i - 1),
arrayCollection.ElementAt(i)
});

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Merge intersecting items in a list of lists - c#

Related

c# combine x amount of items from 2 lists into a 3'rd list

Get duplicate values from dictionary

C# Ordering 2 lists randomly

Combining lists in linq

Operation on overlapping elements on IEnumerable by LINQ

Categories

Resources