Related
Suppose I have a list of strings [city01, city01002, state02, state03, city04, statebg, countryqw, countrypo]
How do I group them in a dictionary of <string, List<Strings>> like
city - [city01, city04, city01002]
state- [state02, state03, statebg]
country - [countrywq, countrypo]
If not code, can anyone please help with how to approach or proceed?
As shown in other answers you can use the GroupBy method from LINQ to create this grouping based on any condition you want. Before you can group your strings you need to know the conditions for how a string is grouped. It could be that it starts with one of a set of predefined prefixes, grouped by whats before the first digit or any random condition you can describe with code. In my code example the groupBy method calls another method for every string in your list and in that method you can place the code you need to group the strings as you want by returning the key to group the given string under. You can test this example online with dotnetfiddle: https://dotnetfiddle.net/UHNXvZ
using System;
using System.Collections.Generic;
using System.Linq;
public class Program
{
public static void Main()
{
List<string> ungroupedList = new List<string>() {"city01", "city01002", "state02", "state03", "city04", "statebg", "countryqw", "countrypo", "theFirstTown"};
var groupedStrings = ungroupedList.GroupBy(x => groupingCondition(x));
foreach (var a in groupedStrings) {
Console.WriteLine("key: " + a.Key);
foreach (var b in a) {
Console.WriteLine("value: " + b);
}
}
}
public static string groupingCondition(String s) {
if(s.StartsWith("city") || s.EndsWith("Town"))
return "city";
if(s.StartsWith("country"))
return "country";
if(s.StartsWith("state"))
return "state";
return "unknown";
}
}
You can use LINQ:
var input = new List<string>()
{ "city01", "city01002", "state02",
"state03", "city04", "statebg", "countryqw", "countrypo" };
var output = input.GroupBy(c => string.Join("", c.TakeWhile(d => !char.IsDigit(d))
.Take(4))).ToDictionary(c => c.Key, c => c.ToList());
i suppose you have a list of references you are searching in the list:
var list = new List<string>()
{ "city01", "city01002", "state02",
"state03", "city04", "statebg", "countryqw", "countrypo" };
var tofound = new List<string>() { "city", "state", "country" }; //references to found
var result = new Dictionary<string, List<string>>();
foreach (var f in tofound)
{
result.Add(f, list.FindAll(x => x.StartsWith(f)));
}
In the result, you have the dictionary wanted. If no value are founded for a reference key, the value of key is null
Warning: This answer has a combinatorial expansion and will fail if your original string set is large. For 65 words I gave up after running for a couple of hours.
Using some IEnumerable extension methods to find Distinct sets and to find all possible combinations of sets, you can generate a group of prefixes and then group the original strings by these.
public static class IEnumerableExt {
public static bool IsDistinct<T>(this IEnumerable<T> items) {
var hs = new HashSet<T>();
foreach (var item in items)
if (!hs.Add(item))
return false;
return true;
}
public static bool IsEmpty<T>(this IEnumerable<T> items) => !items.Any();
public static IEnumerable<IEnumerable<T>> AllCombinations<T>(this IEnumerable<T> start) {
IEnumerable<IEnumerable<T>> HelperCombinations(IEnumerable<T> items) {
if (items.IsEmpty())
yield return items;
else {
var head = items.First();
var tail = items.Skip(1);
foreach (var sequence in HelperCombinations(tail)) {
yield return sequence; // Without first
yield return sequence.Prepend(head);
}
}
}
return HelperCombinations(start).Skip(1); // don't return the empty set
}
}
var keys = Enumerable.Range(0, src.Count - 1)
.SelectMany(n1 => Enumerable.Range(n1 + 1, src.Count - n1 - 1).Select(n2 => new { n1, n2 }))
.Select(n1n2 => new { s1 = src[n1n2.n1], s2 = src[n1n2.n2], Dist = src[n1n2.n1].TakeWhile((ch, n) => n < src[n1n2.n2].Length && ch == src[n1n2.n2][n]).Count() })
.SelectMany(s1s2d => new[] { new { s = s1s2d.s1, s1s2d.Dist }, new { s = s1s2d.s2, s1s2d.Dist } })
.Where(sd => sd.Dist > 0)
.GroupBy(sd => sd.s.Substring(0, sd.Dist))
.Select(sdg => sdg.Distinct())
.AllCombinations()
.Where(sdgc => sdgc.Sum(sdg => sdg.Count()) == src.Count)
.Where(sdgc => sdgc.SelectMany(sdg => sdg.Select(sd => sd.s)).IsDistinct())
.OrderByDescending(sdgc => sdgc.Sum(sdg => sdg.First().Dist)).First()
.Select(sdg => sdg.First())
.Select(sd => sd.s.Substring(0, sd.Dist))
.ToList();
var groups = src.GroupBy(s => keys.First(k => s.StartsWith(k)));
I have an unknown number of buckets(collections), and each bucket having an unknown number of entities
I need to produce a cartesian product of all the entities, so that I endup with a single COLLECTION that has ARRAYS of entities and in each array, there is 1 representetive from EVERY bucket.
So that if I have 5 buckets (B1..B5), and buckets B1, B2 have 1 item each, and bucket B3, B4 and B5 have 4, 8 and 10 items each, I'll have a collection of 320 arrays, and each array will have 5 items.
The only stupud issue here, is that both size of buckets and number of buckets is unknown at development time.
Performance is not super important here, as most of the time, my buckets will have only 1 entity, and only rarely will there be times when some of my buckets will contain 20-30 items...and I'll usually have 5-30 buckets
I'd love to utilize linq here in someway, but my brain is getting fried as I try to imagine how this would work
You could create an extension method like the following:
public static class EnumerableExtensions
{
public static IEnumerable<TValue []> Permutations<TKey, TValue>(this IEnumerable<TKey> keys, Func<TKey, IEnumerable<TValue>> selector)
{
var keyArray = keys.ToArray();
if (keyArray.Length < 1)
yield break;
TValue [] values = new TValue[keyArray.Length];
foreach (var array in Permutations(keyArray, 0, selector, values))
yield return array;
}
static IEnumerable<TValue []> Permutations<TKey, TValue>(TKey [] keys, int index, Func<TKey, IEnumerable<TValue>> selector, TValue [] values)
{
Debug.Assert(keys.Length == values.Length);
var key = keys[index];
foreach (var value in selector(key))
{
values[index] = value;
if (index < keys.Length - 1)
{
foreach (var array in Permutations(keys, index+1, selector, values))
yield return array;
}
else
{
yield return values.ToArray(); // Clone the array;
}
}
}
}
As an example, it could be used like:
public static void TestPermutations()
{
int [][] seqence = new int [][]
{
new int [] {1, 2, 3},
new int [] {101},
new int [] {201},
new int [] {301, 302, 303},
};
foreach (var array in seqence.Permutations(a => a))
{
Debug.WriteLine(array.Aggregate(new StringBuilder(), (sb, i) => { if (sb.Length > 0) sb.Append(","); sb.Append(i); return sb; }));
}
}
and produce the following output:
1,101,201,301
1,101,201,302
1,101,201,303
2,101,201,301
2,101,201,302
2,101,201,303
3,101,201,301
3,101,201,302
3,101,201,303
Is that what you want?
Here's how to do it without recursion in a single Linq statement (wrapped around an extension method for convenience):
public static IEnumerable<IEnumerable<T>> GetPermutations<T>(
IEnumerable<IEnumerable<T>> listOfLists)
{
return listOfLists.Skip(1)
.Aggregate(listOfLists.First()
.Select(c => new List<T>() { c }),
(previous, next) => previous
.SelectMany(p => next.Select(d => new List<T>(p) { d })));
}
The idea is simple:
Skip the first row, so we can use it as the initial value of an aggregate.
Place this initial value in a list that we'll grow on each iteration.
On each iteration, create a new list for each element in previous and add to it each of the elements in next (this is done by new List<T>(p) { d }).
EXAMPLE
Suppose you have an array of arrays as follows:
var arr = new[] {
new[] { 1,2 },
new[] { 10,11,12 },
new[] { 100,101 }
};
Then arr.GetPermutations() will return a list of lists containing:
1,10,100
1,10,101
1,11,100
1,11,101
1,12,100
1,12,101
2,10,100
2,10,101
2,11,100
2,11,101
2,12,100
2,12,101
Non-Linq, non-recursive solution that's faster. We pre-allocate the entire output matrix and then just fill it in a column at a time.
T[][] Permutations<T>(T[][] vals)
{
int numCols = vals.Length;
int numRows = vals.Aggregate(1, (a, b) => a * b.Length);
var results = Enumerable.Range(0, numRows)
.Select(c => new T[numCols])
.ToArray();
int repeatFactor = 1;
for (int c = 0; c < numCols; c++)
{
for (int r = 0; r < numRows; r++)
results[r][c] = vals[c][r / repeatFactor % vals[c].Length];
repeatFactor *= vals[c].Length;
}
return results;
}
Another LINQ-based option, based on suggestion from Diego, but more precise in terms of argument and return types.
It also does not require multiple enumerations of outer collection and hence does not produce Resharper's hint "Possible multiple enumerations".
public static IEnumerable<IReadOnlyCollection<T>> GetPermutations<T>(
IEnumerable<IReadOnlyCollection<T>> collections) =>
collections
.Aggregate(
new[] { Array.Empty<T>() },
(acc, next) =>
acc
.SelectMany(accItem =>
next.Select(nextItem => accItem.Concat(new[] { nextItem }).ToArray()))
.ToArray());
This is probably a very late answer, but I encounter a similar problem i.e. generate all permutations of a list of list of string. However, in my problem, I don't need all permutations simultaneously. I only need/generate next permutation if current permutation doesn't satisfy my condition. Therefore, the following is my way of doing a kind of "for each" and with conditional continuation during permutations generation. This answer is inpsired by Tom19's answer.
void ForEachPermutationDo<T>(IEnumerable<IEnumerable<T>> listOfList, Func<IEnumerable<T>, bool> whatToDo) {
var numCols = listOfList.Count();
var numRows = listOfList.Aggregate(1, (a, b) => a * b.Count());
var continueGenerating = true;
var permutation = new List<T>();
for (var r = 0; r < numRows; r++) {
var repeatFactor = 1;
for (var c = 0; c < numCols; c++) {
var aList = listOfList.ElementAt(c);
permutation.Add(aList.ElementAt((r / repeatFactor) % aList.Count()));
repeatFactor *= aList.Count();
}
continueGenerating = whatToDo(permutation.ToList()); // send duplicate
if (!continueGenerating) break;
permutation.Clear();
}
}
Using the above method, generating all permutation can be done like
IEnumerable<IEnumerable<T>> GenerateAllPermutations<T>(IEnumerable<IEnumerable<T>> listOfList) {
var results = new List<List<T>>();
ForEachPermutationDo(listOfList, (permutation) => {
results.Add(permutation);
return true;
});
return results;
}
When i have a list
IList<int> list = new List<int>();
list.Add(100);
list.Add(200);
list.Add(300);
list.Add(400);
list.Add(500);
What is the way to extract a pairs
Example : List elements {100,200,300,400,500}
Expected Pair : { {100,200} ,{200,300} ,{300,400} ,{400,500} }
The most elegant way with LINQ: list.Zip(list.Skip(1), Tuple.Create)
A real-life example: This extension method takes a collection of points (Vector2) and produces a collection of lines (PathSegment) needed to 'join the dots'.
static IEnumerable<PathSegment> JoinTheDots(this IEnumerable<Vector2> dots)
{
var segments = dots.Zip(dots.Skip(1), (a,b) => new PathSegment(a, b));
return segments;
}
This will give you an array of anonymous "pair" objects with A and B properties corresponding to the pair elements.
var pairs = list.Where( (e,i) => i < list.Count - 1 )
.Select( (e,i) => new { A = e, B = list[i+1] } );
You can use a for loop:
var pairs = new List<int[]>();
for(int i = 0; i < list.Length - 1; i++)
pairs.Add(new [] {list[i], list[i + 1]);
You can also use LINQ, but it's uglier:
var pairs = list.Take(list.Count - 1).Select((n, i) => new [] { n, list[i + 1] });
EDIT: You can even do it on a raw IEnumerable, but it's much uglier:
var count = list.Count();
var pairs = list
.SelectMany((n, i) => new [] { new { Index = i - 1, Value = n }, new { Index = i, Value = n } })
.Where(ivp => ivp.Index >= 0 && ivp.Index < count - 1) //We only want one copy of the first and last value
.GroupBy(ivp => ivp.Index, (i, ivps) => ivps.Select(ivp => ivp.Value));
More general would be:
public static IEnumerable<TResult> Pairwise<TSource, TResult>(this IEnumerable<TSource> values, int count, Func<TSource[], TResult> pairCreator)
{
if (count < 1) throw new ArgumentOutOfRangeException("count");
if (values == null) throw new ArgumentNullException("values");
if (pairCreator == null) throw new ArgumentNullException("pairCreator");
int c = 0;
var data = new TSource[count];
foreach (var item in values)
{
if (c < count)
data[c++] = item;
if (c == count)
{
yield return pairCreator(data);
c = 0;
}
}
}
Following solution uses zip method. Zip originalList and originalList.Skip(1) so that one gets desired result.
var adjacents =
originalList.Zip(originalList.Skip(1),
(a,b) => new {N1 = a, N2 = b});
Using .Windowed() from MoreLINQ:
var source = new[] {100,200,300,400,500};
var result = source.Windowed(2).Select(x => Tuple.Create(x.First(),x.Last()));
Off the top of my head and completely untested:
public static T Pairwise<T>(this IEnumerable<T> list)
{
T last;
bool firstTime = true;
foreach(var item in list)
{
if(!firstTime)
return(Tuple.New(last, item));
else
firstTime = false;
last = item;
}
}
How to convert a String[] to an IDictionary<String, String>?
The values at the indices 0,2,4,... shall be keys, and consequently values at the indices 1,3,5,... shall be values.
Example:
new[] { "^BI", "connectORCL", "^CR", "connectCR" }
=>
new Dictionary<String, String> {{"^BI", "connectORCL"}, {"^CR", "connectCR"}};
I'd recommend a good old for loop for clarity. But if you insist on a LINQ query, this should work:
var dictionary = Enumerable.Range(0, array.Length/2)
.ToDictionary(i => array[2*i], i => array[2*i+1])
Dictionary<string,string> ArrayToDict(string[] arr)
{
if(arr.Length%2!=0)
throw new ArgumentException("Array doesn't contain an even number of entries");
Dictionary<string,string> dict=new Dictionary<string,string>();
for(int i=0;i<arr.Length/2;i++)
{
string key=arr[2*i];
string value=arr[2*i+1];
dict.Add(key,value);
}
return dict;
}
There's really no easy way to do this in LINQ (And even if there were, it's certainly not going to be clear as to the intent). It's easily accomplished by a simple loop though:
// This code assumes you can guarantee your array to always have an even number
// of elements.
var array = new[] { "^BI", "connectORCL", "^CR", "connectCR" };
var dict = new Dictionary<string, string>();
for(int i=0; i < array.Length; i+=2)
{
dict.Add(array[i], array[i+1]);
}
Something like this maybe:
string[] keyValues = new string[20];
Dictionary<string, string> dict = new Dictionary<string, string>();
for (int i = 0; i < keyValues.Length; i+=2)
{
dict.Add(keyValues[i], keyValues[i + 1]);
}
Edit: People in the C# tag are damn fast...
If you have Rx as a dependency you can do:
strings
.BufferWithCount(2)
.ToDictionary(
buffer => buffer.First(), // key selector
buffer => buffer.Last()); // value selector
BufferWithCount(int count) takes the first count values from the input sequence and yield them as a list, then it takes the next count values and so on. I.e. from your input sequence you will get the pairs as lists: {"^BI", "connectORCL"}, {"^CR", "connectCR"}, the ToDictionary then takes the first list item as key and the last ( == second for lists of two items) as value.
However, if you don't use Rx, you can use this implementation of BufferWithCount:
static class EnumerableX
{
public static IEnumerable<IList<T>> BufferWithCount<T>(this IEnumerable<T> source, int count)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
if (count <= 0)
{
throw new ArgumentOutOfRangeException("count");
}
var buffer = new List<T>();
foreach (var t in source)
{
buffer.Add(t);
if (buffer.Count == count)
{
yield return buffer;
buffer = new List<T>();
}
}
if (buffer.Count > 0)
{
yield return buffer;
}
}
}
It looks like other people have already beaten me to it and/or have more efficient answers but I'm posting 2 ways:
A for loop might be the clearest way to accomplish in this case...
var words = new[] { "^BI", "connectORCL", "^CR", "connectCR" };
var final = words.Where((w, i) => i % 2 == 0)
.Select((w, i) => new[] { w, words[(i * 2) + 1] })
.ToDictionary(arr => arr[0], arr => arr[1])
;
final.Dump();
//alternate way using zip
var As = words.Where((w, i) => i % 2 == 0);
var Bs = words.Where((w, i) => i % 2 == 1);
var dictionary = new Dictionary<string, string>(As.Count());
var pairs = As.Zip(Bs, (first, second) => new[] {first, second})
.ToDictionary(arr => arr[0], arr => arr[1])
;
pairs.Dump();
FYI, this is what I ended up with using a loop and implementing it as an extension method:
internal static Boolean IsEven(this Int32 #this)
{
return #this % 2 == 0;
}
internal static IDictionary<String, String> ToDictionary(this String[] #this)
{
if (!#this.Length.IsEven())
throw new ArgumentException( "Array doesn't contain an even number of entries" );
var dictionary = new Dictionary<String, String>();
for (var i = 0; i < #this.Length; i += 2)
{
var key = #this[i];
var value = #this[i + 1];
dictionary.Add(key, value);
}
return dictionary;
}
Pure Linq
Select : Project original string value and its index.
GroupBy : Group adjacent pairs.
Convert each group into dictionary entry.
string[] arr = new string[] { "^BI", "connectORCL", "^CR", "connectCR" };
var dictionary = arr.Select((value,i) => new {Value = value,Index = i})
.GroupBy(value => value.Index / 2)
.ToDictionary(g => g.FirstOrDefault().Value,
g => g.Skip(1).FirstOrDefault().Value);
I have several huge sorted enumerable sequences that I want to merge. Theses lists are manipulated as IEnumerable but are already sorted. Since input lists are sorted, it should be possible to merge them in one trip, without re-sorting anything.
I would like to keep the defered execution behavior.
I tried to write a naive algorithm which do that (see below). However, it looks pretty ugly and I'm sure it can be optimized. It may exist a more academical algorithm...
IEnumerable<T> MergeOrderedLists<T, TOrder>(IEnumerable<IEnumerable<T>> orderedlists,
Func<T, TOrder> orderBy)
{
var enumerators = orderedlists.ToDictionary(l => l.GetEnumerator(), l => default(T));
IEnumerator<T> tag = null;
var firstRun = true;
while (true)
{
var toRemove = new List<IEnumerator<T>>();
var toAdd = new List<KeyValuePair<IEnumerator<T>, T>>();
foreach (var pair in enumerators.Where(pair => firstRun || tag == pair.Key))
{
if (pair.Key.MoveNext())
toAdd.Add(pair);
else
toRemove.Add(pair.Key);
}
foreach (var enumerator in toRemove)
enumerators.Remove(enumerator);
foreach (var pair in toAdd)
enumerators[pair.Key] = pair.Key.Current;
if (enumerators.Count == 0)
yield break;
var min = enumerators.OrderBy(t => orderBy(t.Value)).FirstOrDefault();
tag = min.Key;
yield return min.Value;
firstRun = false;
}
}
The method can be used like that:
// Person lists are already sorted by age
MergeOrderedLists(orderedList, p => p.Age);
assuming the following Person class exists somewhere:
public class Person
{
public int Age { get; set; }
}
Duplicates should be conserved, we don't care about their order in the new sequence. Do you see any obvious optimization I could use?
Here is my fourth (thanks to #tanascius for pushing this along to something much more LINQ) cut at it:
public static IEnumerable<T> MergePreserveOrder3<T, TOrder>(
this IEnumerable<IEnumerable<T>> aa,
Func<T, TOrder> orderFunc)
where TOrder : IComparable<TOrder>
{
var items = aa.Select(xx => xx.GetEnumerator()).Where(ee => ee.MoveNext())
.OrderBy(ee => orderFunc(ee.Current)).ToList();
while (items.Count > 0)
{
yield return items[0].Current;
var next = items[0];
items.RemoveAt(0);
if (next.MoveNext())
{
// simple sorted linear insert
var value = orderFunc(next.Current);
var ii = 0;
for ( ; ii < items.Count; ++ii)
{
if (value.CompareTo(orderFunc(items[ii].Current)) <= 0)
{
items.Insert(ii, next);
break;
}
}
if (ii == items.Count) items.Add(next);
}
else next.Dispose(); // woops! can't forget IDisposable
}
}
Results:
for (int p = 0; p < people.Count; ++p)
{
Console.WriteLine("List {0}:", p + 1);
Console.WriteLine("\t{0}", String.Join(", ", people[p].Select(x => x.Name)));
}
Console.WriteLine("Merged:");
foreach (var person in people.MergePreserveOrder(pp => pp.Age))
{
Console.WriteLine("\t{0}", person.Name);
}
List 1:
8yo, 22yo, 47yo, 49yo
List 2:
35yo, 47yo, 60yo
List 3:
28yo, 55yo, 64yo
Merged:
8yo
22yo
28yo
35yo
47yo
47yo
49yo
55yo
60yo
64yo
Improved with .Net 4.0's Tuple support:
public static IEnumerable<T> MergePreserveOrder4<T, TOrder>(
this IEnumerable<IEnumerable<T>> aa,
Func<T, TOrder> orderFunc) where TOrder : IComparable<TOrder>
{
var items = aa.Select(xx => xx.GetEnumerator())
.Where(ee => ee.MoveNext())
.Select(ee => Tuple.Create(orderFunc(ee.Current), ee))
.OrderBy(ee => ee.Item1).ToList();
while (items.Count > 0)
{
yield return items[0].Item2.Current;
var next = items[0];
items.RemoveAt(0);
if (next.Item2.MoveNext())
{
var value = orderFunc(next.Item2.Current);
var ii = 0;
for (; ii < items.Count; ++ii)
{
if (value.CompareTo(items[ii].Item1) <= 0)
{ // NB: using a tuple to minimize calls to orderFunc
items.Insert(ii, Tuple.Create(value, next.Item2));
break;
}
}
if (ii == items.Count) items.Add(Tuple.Create(value, next.Item2));
}
else next.Item2.Dispose(); // woops! can't forget IDisposable
}
}
One guess I would make that might improve clarity and performance is this:
Create a priority queue over pairs of T, IEnumerable<T> ordered according to your comparison function on T
For each IEnumerable<T> being merged, add the item to the priority queue annotated with a reference to the IEnumerable<T> where it originated
While the priority queue is not empty
Extract the minimum element from the priority queue
Advance the IEnumerable<T> in its annotation to the next element
If MoveNext() returned true, add the next element to the priority queue annotated with a reference to the IEnumerable<T> you just advanced
If MoveNext() returned false, don't add anything to the priority queue
Yield the dequeued element
How many lists do you expect to need to merge? It looks like your algorithm will not be efficient if you have many different lists to merge. This line is the issue:
var min = enumerators.OrderBy(t => orderBy(t.Value)).FirstOrDefault();
This will be run once for each element in all the lists, so your runtime will be O(n * m), where n is the TOTAL number of elements in all the lists, and n is the number of lists. Expressed in terms of the average length of a list in the list of lists, the runtime is O(a * m^2).
If you are going to need to merge a lot of lists, I would suggest using a heap. Then each iteration you can remove the smallest value from the heap, and add the next element to the heap from the list that the smallest value came from.
Here's a solution with NO SORTING ... just the minimum number of comparisons. (I omitted the actual order func passing for simplicity). Updated to build a balanced tree:-
/// <summary>
/// Merge a pair of ordered lists
/// </summary>
public static IEnumerable<T> Merge<T>(IEnumerable<T> aList, IEnumerable<T> bList)
where T:IComparable<T>
{
var a = aList.GetEnumerator();
bool aOK = a.MoveNext();
foreach (var b in bList)
{
while (aOK && a.Current.CompareTo(b) <= 0) {yield return a.Current; aOK = a.MoveNext();}
yield return b;
}
// And anything left in a
while (aOK) { yield return a.Current; aOK = a.MoveNext(); }
}
/// <summary>
/// Merge lots of sorted lists
/// </summary>
public static IEnumerable<T> Merge<T>(IEnumerable<IEnumerable<T>> listOfLists)
where T : IComparable<T>
{
int n = listOfLists.Count();
if (n < 2)
return listOfLists.FirstOrDefault();
else
return Merge (Merge(listOfLists.Take(n/2)), Merge(listOfLists.Skip(n/2)));
}
public static void Main(string[] args)
{
var sample = Enumerable.Range(1, 5).Select((i) => Enumerable.Range(i, i+5).Select(j => string.Format("Test {0:00}", j)));
Console.WriteLine("Merged:");
foreach (var result in Merge(sample))
{
Console.WriteLine("\t{0}", result);
}
Here is a solution that has very good complexity analysis and that is considerably shorter than the other solutions proposed.
public static IEnumerable<T> Merge<T>(this IEnumerable<IEnumerable<T>> self)
where T : IComparable<T>
{
var es = self.Select(x => x.GetEnumerator()).Where(e => e.MoveNext());
var tmp = es.ToDictionary(e => e.Current);
var dict = new SortedDictionary<T, IEnumerator<T>>(tmp);
while (dict.Count > 0)
{
var key = dict.Keys.First();
var cur = dict[key];
dict.Remove(key);
yield return cur.Current;
if (cur.MoveNext())
dict.Add(cur.Current, cur);
}
}
Here is my solution:
The algorithm takes the first element of each list and puts them within a small helper class (a sorted list that accepts mutliple elements with the same value). This sorted list uses a binary insert.
So the first element in this list is the element we want to return next. After doing so we remove it from the sorted list and insert the next element from its original source list (at least as long as this list contains any more elements). Again, we can return the first element of our sorted list. When the sorted list is empty once, we used all element from all different source lists and are done.
This solution uses less foreach statements and no OrderBy in each step - which should improve the runtime behaviour. Only the binary insert has to be done again and again.
IEnumerable<T> MergeOrderedLists<T, TOrder>( IEnumerable<IEnumerable<T>> orderedlists, Func<T, TOrder> orderBy )
{
// Get an enumerator for each list, create a sortedList
var enumerators = orderedlists.Select( enumerable => enumerable.GetEnumerator() );
var sortedEnumerators = new SortedListAllowingDoublets<TOrder, IEnumerator<T>>();
// Point each enumerator onto the first element
foreach( var enumerator in enumerators )
{
// Missing: assert true as the return value
enumerator.MoveNext();
// Initially add the first value
sortedEnumerators.AddSorted( orderBy( enumerator.Current ), enumerator );
}
// Continue as long as we have elements to return
while( sortedEnumerators.Count != 0 )
{
// The first element of the sortedEnumerator list always
// holds the next element to return
var enumerator = sortedEnumerators[0].Value;
// Return this enumerators current value
yield return enumerator.Current;
// Remove the element we just returned
sortedEnumerators.RemoveAt( 0 );
// Check if there is another element in the list of the enumerator
if( enumerator.MoveNext() )
{
// Ok, so add it to the sorted list
sortedEnumerators.AddSorted( orderBy( enumerator.Current ), enumerator );
}
}
My helper class (using a simple binary insert):
private class SortedListAllowingDoublets<TOrder, T> : Collection<KeyValuePair<TOrder, T>> where T : IEnumerator
{
public void AddSorted( TOrder value, T enumerator )
{
Insert( GetSortedIndex( value, 0, Count - 1 ), new KeyValuePair<TOrder, T>( value, enumerator ) );
}
private int GetSortedIndex( TOrder item, int startIndex, int endIndex )
{
if( startIndex > endIndex )
{
return startIndex;
}
var midIndex = startIndex + ( endIndex - startIndex ) / 2;
return Comparer<TOrder>.Default.Compare( this[midIndex].Key, item ) < 0 ? GetSortedIndex( item, midIndex + 1, endIndex ) : GetSortedIndex( item, startIndex, midIndex - 1 );
}
}
What's not implemented right now: check for an empty list, which will cause problems.
And the SortedListAllowingDoublets class could be improved to take a comparer instead of using the Comparer<TOrder>.Default on its own.
Here is a Linq friendly solution based on the Wintellect's OrderedBag:
public static IEnumerable<T> MergeOrderedLists<T, TOrder>(this IEnumerable<IEnumerable<T>> orderedLists, Func<T, TOrder> orderBy)
where TOrder : IComparable<TOrder>
{
var enumerators = new OrderedBag<IEnumerator<T>>(orderedLists
.Select(enumerable => enumerable.GetEnumerator())
.Where(enumerator => enumerator.MoveNext()),
(x, y) => orderBy(x.Current).CompareTo(orderBy(y.Current)));
while (enumerators.Count > 0)
{
IEnumerator<T> minEnumerator = enumerators.RemoveFirst();
T minValue = minEnumerator.Current;
if (minEnumerator.MoveNext())
enumerators.Add(minEnumerator);
else
minEnumerator.Dispose();
yield return minValue;
}
}
If you use any Enumerator based solution, don't forget to call Dispose()
And here is a simple test:
[Test]
public void ShouldMergeInOrderMultipleOrderedListWithDuplicateValues()
{
// given
IEnumerable<IEnumerable<int>> orderedLists = new[]
{
new [] {1, 5, 7},
new [] {1, 2, 4, 6, 7}
};
// test
IEnumerable<int> merged = orderedLists.MergeOrderedLists(i => i);
// expect
merged.ShouldAllBeEquivalentTo(new [] { 1, 1, 2, 4, 5, 6, 7, 7 });
}
My version of sixlettervariables's answer. I reduced the number of calls to orderFunc (each element only passes through orderFunc once), and in the case of ties, sorting is skipped. This is optimized for small numbers of sources, larger numbers of elements within each source and possibly an expensive orderFunc.
public static IEnumerable<T> MergePreserveOrder<T, TOrder>(
this IEnumerable<IEnumerable<T>> sources,
Func<T, TOrder> orderFunc)
where TOrder : IComparable<TOrder>
{
Dictionary<TOrder, List<IEnumerable<T>>> keyedSources =
sources.Select(source => source.GetEnumerator())
.Where(e => e.MoveNext())
.GroupBy(e => orderFunc(e.Current))
.ToDictionary(g => g.Key, g => g.ToList());
while (keyedSources.Any())
{
//this is the expensive line
KeyValuePair<TOrder, List<IEnumerable<T>>> firstPair = keyedSources
.OrderBy(kvp => kvp.Key).First();
keyedSources.Remove(firstPair.Key);
foreach(IEnumerable<T> e in firstPair.Value)
{
yield return e.Current;
if (e.MoveNext())
{
TOrder newKey = orderFunc(e.Current);
if (!keyedSources.ContainsKey(newKey))
{
keyedSources[newKey] = new List<IEnumerable<T>>() {e};
}
else
{
keyedSources[newKey].Add(e);
}
}
}
}
}
I'm betting this could be further improved by a SortedDictionary, but am not brave enough to try a solution using one without an editor.
Here is a modern implementation that is based on the new and powerful PriorityQueue<TElement, TPriority> class (.NET 6). It combines the low overhead of user7116's solution, with the O(log n) complexity of tanascius's solution (where N is the number of sources). It outperforms most of the other implementations presented in this question (I didn't measure them all), either slightly for small N, or massively for large N.
public static IEnumerable<TSource> MergeSorted<TSource, TKey>(
this IEnumerable<IEnumerable<TSource>> sortedSources,
Func<TSource, TKey> keySelector,
IComparer<TKey> comparer = default)
{
List<IEnumerator<TSource>> enumerators = new();
try
{
foreach (var source in sortedSources)
enumerators.Add(source.GetEnumerator());
var queue = new PriorityQueue<IEnumerator<TSource>, TKey>(comparer);
foreach (var enumerator in enumerators)
{
if (enumerator.MoveNext())
queue.Enqueue(enumerator, keySelector(enumerator.Current));
}
while (queue.TryPeek(out var enumerator, out _))
{
yield return enumerator.Current;
if (enumerator.MoveNext())
queue.EnqueueDequeue(enumerator, keySelector(enumerator.Current));
else
queue.Dequeue();
}
}
finally
{
foreach (var enumerator in enumerators) enumerator.Dispose();
}
}
In order to keep the code simple, all enumerators are disposed at the end of the combined enumeration. A more sophisticated implementation would dispose each enumerator immediately after its completion.
This looks like a terribly useful function to have around so i decided to take a stab at it. My approach is a lot like heightechrider in that it breaks the problem down into merging two sorted IEnumerables into one, then taking that one and merging it with the next in the list. There is most likely some optimization you can do but it works with my simple testcase:
public static IEnumerable<T> mergeSortedEnumerables<T>(
this IEnumerable<IEnumerable<T>> listOfLists,
Func<T, T, Boolean> func)
{
IEnumerable<T> l1 = new List<T>{};
foreach (var l in listOfLists)
{
l1 = l1.mergeTwoSorted(l, func);
}
foreach (var t in l1)
{
yield return t;
}
}
public static IEnumerable<T> mergeTwoSorted<T>(
this IEnumerable<T> l1,
IEnumerable<T> l2,
Func<T, T, Boolean> func)
{
using (var enumerator1 = l1.GetEnumerator())
using (var enumerator2 = l2.GetEnumerator())
{
bool enum1 = enumerator1.MoveNext();
bool enum2 = enumerator2.MoveNext();
while (enum1 || enum2)
{
T t1 = enumerator1.Current;
T t2 = enumerator2.Current;
//if they are both false
if (!enum1 && !enum2)
{
break;
}
//if enum1 is false
else if (!enum1)
{
enum2 = enumerator2.MoveNext();
yield return t2;
}
//if enum2 is false
else if (!enum2)
{
enum1 = enumerator1.MoveNext();
yield return t1;
}
//they are both true
else
{
//if func returns true then t1 < t2
if (func(t1, t2))
{
enum1 = enumerator1.MoveNext();
yield return t1;
}
else
{
enum2 = enumerator2.MoveNext();
yield return t2;
}
}
}
}
}
Then to test it:
List<int> ws = new List<int>() { 1, 8, 9, 16, 17, 21 };
List<int> xs = new List<int>() { 2, 7, 10, 15, 18 };
List<int> ys = new List<int>() { 3, 6, 11, 14 };
List<int> zs = new List<int>() { 4, 5, 12, 13, 19, 20 };
List<IEnumerable<int>> lss = new List<IEnumerable<int>> { ws, xs, ys, zs };
foreach (var v in lss.mergeSortedEnumerables(compareInts))
{
Console.WriteLine(v);
}
I was asked this question as an interview question this evening and did not have a great answer in the 20 or so minutes allotted. So I forced myself to write an algorithm without doing any searches. The constraint was that the inputs were already sorted. Here's my code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Merger
{
class Program
{
static void Main(string[] args)
{
int[] a = { 1, 3, 6, 102, 105, 230 };
int[] b = { 101, 103, 112, 155, 231 };
var mm = new MergeMania();
foreach(var val in mm.Merge<int>(a, b))
{
Console.WriteLine(val);
}
Console.ReadLine();
}
}
public class MergeMania
{
public IEnumerable<T> Merge<T>(params IEnumerable<T>[] sortedSources)
where T : IComparable
{
if (sortedSources == null || sortedSources.Length == 0)
throw new ArgumentNullException("sortedSources");
//1. fetch enumerators for each sourc
var enums = (from n in sortedSources
select n.GetEnumerator()).ToArray();
//2. fetch enumerators that have at least one value
var enumsWithValues = (from n in enums
where n.MoveNext()
select n).ToArray();
if (enumsWithValues.Length == 0) yield break; //nothing to iterate over
//3. sort by current value in List<IEnumerator<T>>
var enumsByCurrent = (from n in enumsWithValues
orderby n.Current
select n).ToList();
//4. loop through
while (true)
{
//yield up the lowest value
yield return enumsByCurrent[0].Current;
//move the pointer on the enumerator with that lowest value
if (!enumsByCurrent[0].MoveNext())
{
//remove the first item in the list
enumsByCurrent.RemoveAt(0);
//check for empty
if (enumsByCurrent.Count == 0) break; //we're done
}
enumsByCurrent = enumsByCurrent.OrderBy(x => x.Current).ToList();
}
}
}
}
Hope it helps.
An attempt to improve on #cdiggins's answer.
This implementation works correctly if two elements that compare as equal are contained in two different sequences (i. e. doesn't have the flaw mentioned by #ChadHenderson).
The algorithm is described in Wikipedia, the complexity is O(m log n), where n is the number of lists being merged and m is the sum of the lengths of the lists.
The OrderedBag<T> from Wintellect.PowerCollections is used instead of a heap-based priority queue, but it doesn't change the complexity.
public static IEnumerable<T> Merge<T>(
IEnumerable<IEnumerable<T>> listOfLists,
Func<T, T, int> comparison = null)
{
IComparer<T> cmp = comparison != null
? Comparer<T>.Create(new Comparison<T>(comparison))
: Comparer<T>.Default;
List<IEnumerator<T>> es = listOfLists
.Select(l => l.GetEnumerator())
.Where(e => e.MoveNext())
.ToList();
var bag = new OrderedBag<IEnumerator<T>>(
(e1, e2) => cmp.Compare(e1.Current, e2.Current));
es.ForEach(e => bag.Add(e));
while (bag.Count > 0)
{
IEnumerator<T> e = bag.RemoveFirst();
yield return e.Current;
if (e.MoveNext())
{
bag.Add(e);
}
}
}
Each list being merged should be already sorted. This method will locate the equal elements with respect to the order of their lists. For example, if elements Ti == Tj, and they are respectively from list i and list j (i < j), then Ti will be in front of Tj in the merged result.
The complexity is O(mn), where n is the number of lists being merged and m is the sum of the lengths of the lists.
public static IEnumerable<T> Merge<T, TOrder>(this IEnumerable<IEnumerable<T>> TEnumerable_2, Func<T, TOrder> orderFunc, IComparer<TOrder> cmp=null)
{
if (cmp == null)
{
cmp = Comparer<TOrder>.Default;
}
List<IEnumerator<T>> TEnumeratorLt = TEnumerable_2
.Select(l => l.GetEnumerator())
.Where(e => e.MoveNext())
.ToList();
while (TEnumeratorLt.Count > 0)
{
int intMinIndex;
IEnumerator<T> TSmallest = TEnumeratorLt.GetMin(TElement => orderFunc(TElement.Current), out intMinIndex, cmp);
yield return TSmallest.Current;
if (TSmallest.MoveNext() == false)
{
TEnumeratorLt.RemoveAt(intMinIndex);
}
}
}
/// <summary>
/// Get the first min item in an IEnumerable, and return the index of it by minIndex
/// </summary>
public static T GetMin<T, TOrder>(this IEnumerable<T> self, Func<T, TOrder> orderFunc, out int minIndex, IComparer<TOrder> cmp = null)
{
if (self == null) throw new ArgumentNullException("self");
IEnumerator<T> selfEnumerator = self.GetEnumerator();
if (!selfEnumerator.MoveNext()) throw new ArgumentException("List is empty.", "self");
if (cmp == null) cmp = Comparer<TOrder>.Default;
T min = selfEnumerator.Current;
minIndex = 0;
int intCount = 1;
while (selfEnumerator.MoveNext ())
{
if (cmp.Compare(orderFunc(selfEnumerator.Current), orderFunc(min)) < 0)
{
min = selfEnumerator.Current;
minIndex = intCount;
}
intCount++;
}
return min;
}
I've took a more functional approach, hope this reads well.
First of all here is the merge method itself:
public static IEnumerable<T> MergeSorted<T>(IEnumerable<IEnumerable<T>> xss) where T :IComparable
{
var stacks = xss.Select(xs => new EnumerableStack<T>(xs)).ToList();
while (true)
{
if (stacks.All(x => x.IsEmpty)) yield break;
yield return
stacks
.Where(x => !x.IsEmpty)
.Select(x => new { peek = x.Peek(), x })
.MinBy(x => x.peek)
.x.Pop();
}
}
The idea is that we turn each IEnumerable into EnumerableStack that has Peek(), Pop() and IsEmpty members.
It works just like a regular stack. Note that calling IsEmpty might enumerate wrapped IEnumerable.
Here is the code:
/// <summary>
/// Wraps IEnumerable in Stack like wrapper
/// </summary>
public class EnumerableStack<T>
{
private enum StackState
{
Pending,
HasItem,
Empty
}
private readonly IEnumerator<T> _enumerator;
private StackState _state = StackState.Pending;
public EnumerableStack(IEnumerable<T> xs)
{
_enumerator = xs.GetEnumerator();
}
public T Pop()
{
var res = Peek(isEmptyMessage: "Cannot Pop from empty EnumerableStack");
_state = StackState.Pending;
return res;
}
public T Peek()
{
return Peek(isEmptyMessage: "Cannot Peek from empty EnumerableStack");
}
public bool IsEmpty
{
get
{
if (_state == StackState.Empty) return true;
if (_state == StackState.HasItem) return false;
ReadNext();
return _state == StackState.Empty;
}
}
private T Peek(string isEmptyMessage)
{
if (_state != StackState.HasItem)
{
if (_state == StackState.Empty) throw new InvalidOperationException(isEmptyMessage);
ReadNext();
if (_state == StackState.Empty) throw new InvalidOperationException(isEmptyMessage);
}
return _enumerator.Current;
}
private void ReadNext()
{
_state = _enumerator.MoveNext() ? StackState.HasItem : StackState.Empty;
}
}
Finally, here is the MinBy extension method in case haven't written one on your own already:
public static T MinBy<T, TS>(this IEnumerable<T> xs, Func<T, TS> selector) where TS : IComparable
{
var en = xs.GetEnumerator();
if (!en.MoveNext()) throw new Exception();
T max = en.Current;
TS maxVal = selector(max);
while(en.MoveNext())
{
var x = en.Current;
var val = selector(x);
if (val.CompareTo(maxVal) < 0)
{
max = x;
maxVal = val;
}
}
return max;
}
This is an alternate solution:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Reflection;
using System.Data;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Person
{
public string Name
{
get;
set;
}
public int Age
{
get;
set;
}
}
public class Program
{
public static void Main()
{
Person[] persons1 = new Person[] { new Person() { Name = "Ahmed", Age = 20 }, new Person() { Name = "Ali", Age = 40 } };
Person[] persons2 = new Person[] { new Person() { Name = "Zaid", Age = 21 }, new Person() { Name = "Hussain", Age = 22 } };
Person[] persons3 = new Person[] { new Person() { Name = "Linda", Age = 19 }, new Person() { Name = "Souad", Age = 60 } };
Person[][] personArrays = new Person[][] { persons1, persons2, persons3 };
foreach(Person person in MergeOrderedLists<Person, int>(personArrays, person => person.Age))
{
Console.WriteLine("{0} {1}", person.Name, person.Age);
}
Console.ReadLine();
}
static IEnumerable<T> MergeOrderedLists<T, TOrder>(IEnumerable<IEnumerable<T>> orderedlists, Func<T, TOrder> orderBy)
{
List<IEnumerator<T>> enumeratorsWithData = orderedlists.Select(enumerable => enumerable.GetEnumerator())
.Where(enumerator => enumerator.MoveNext()).ToList();
while (enumeratorsWithData.Count > 0)
{
IEnumerator<T> minEnumerator = enumeratorsWithData[0];
for (int i = 1; i < enumeratorsWithData.Count; i++)
if (((IComparable<TOrder>)orderBy(minEnumerator.Current)).CompareTo(orderBy(enumeratorsWithData[i].Current)) >= 0)
minEnumerator = enumeratorsWithData[i];
yield return minEnumerator.Current;
if (!minEnumerator.MoveNext())
enumeratorsWithData.Remove(minEnumerator);
}
}
}
}
I'm suspicious LINQ is smart enough to take advantage of the prior existing sort order:
IEnumerable<string> BiggerSortedList = BigListOne.Union(BigListTwo).OrderBy(s => s);