I've writen a little test case to explain my issue.
I'm somehow able to query my DB to get a list of list of tuple.
From which I want to extract a list of tuple, with no duplicate, ordered by Item1 ... which is fine, but now I always want to remove tuple when Item2 is not sorted in descending order.
I was able to do this by creating a temporary list and then removing bad tuples.
Could you please help me do to this directly in linq (if possible ?) ?
using System;
using System.Collections.Generic;
using System.Linq;
using NUnit.Framework;
namespace Web.Test
{
[TestFixture]
public class ListListTupleTest
{
[TestCase]
public void TestCaseTest_1()
{
var input = new List<List<Tuple<int, decimal>>>
{
new List<Tuple<int, decimal>>
{
new Tuple<int, decimal>(5, 20),
new Tuple<int, decimal>(8, 10)
},
new List<Tuple<int, decimal>>
{
new Tuple<int, decimal>(7, 17),
new Tuple<int, decimal>(12, 9)
},
new List<Tuple<int, decimal>>
{
new Tuple<int, decimal>(7, 17),
new Tuple<int, decimal>(15, 10)
}
};
var goal = new List<Tuple<int, decimal>>()
{
new Tuple<int, decimal>(5, 20),
new Tuple<int, decimal>(7, 17),
new Tuple<int, decimal>(8, 10),
new Tuple<int, decimal>(12, 9)
};
var result = myFunction(input);
CollectionAssert.AreEqual(result, goal);
}
private List<Tuple<int, decimal>> myFunction(List<List<Tuple<int, decimal>>> myList)
{
var tmp = myList
.SelectMany(x => x.ToArray())
.Distinct()
.OrderBy(x => x.Item1)
.ToList();
var result = new List<Tuple<int, decimal>>();
if (tmp.Any())
{
result.Add(tmp.First());
decimal current = tmp.First().Item2;
foreach (var tuple in tmp.Skip(1))
{
if (tuple.Item2 < current)
{
result.Add(tuple);
current = tuple.Item2;
}
}
}
return result;
}
}
}
I agree with others that loop might be a best solution here, but if you really really want to use LINQ, you can use Aggregate like this:
return myList
.SelectMany(x => x.ToArray())
.Distinct()
.OrderBy(x => x.Item1)
.Aggregate(Enumerable.Empty<Tuple<int, decimal>>(),
(acc, value) => value.Item2 > acc.LastOrDefault()?.Item2 ?
acc :
acc.Concat(new[] {value}))
.ToList();
This basically replicates your loop: we start with empty set (Enumerable.Empty<Tuple<int, decimal>>()) and then aggregate gives values one by one to our callback. There we either return previous set as is, or adding current item to it, depending on Item2 comparision.
You can also use List as accumulator instead of Enumerable.Empty:
return myList
.SelectMany(x => x.ToArray())
.Distinct()
.OrderBy(x => x.Item1)
.Aggregate(new List<Tuple<int, decimal>>(),
(acc, value) =>
{
var last = acc.Count > 0 ? acc[acc.Count - 1] : null;
if (last == null || value.Item2 < last.Item2)
acc.Add(value);
return acc;
}); // ToList is not needed - already a list
To use LINQ for this, I use a special extension method that is based on the APL scan operator - it is like Aggregate, but returns all the intermediate results. In this case, I use a special variation that automatically pairs results with original data in a ValueTuple, and initializes the state with a Func on the first value:
public static IEnumerable<(TKey Key, T Value)> ScanPair<T, TKey>(this IEnumerable<T> src, Func<T, TKey> fnSeed, Func<(TKey Key, T Value), T, TKey> combine) {
using (var srce = src.GetEnumerator()) {
if (srce.MoveNext()) {
var seed = (fnSeed(srce.Current), srce.Current);
while (srce.MoveNext()) {
yield return seed;
seed = (combine(seed, srce.Current), srce.Current);
}
yield return seed;
}
}
}
Now it is relatively straight forward to compute your result - you do it pretty much like you state:
var ans = input.SelectMany(sub => sub, (l, s) => s) // flatten lists to one list
.Distinct() // keep only distinct tuples
.OrderBy(s => s.Item1) // sort by Item1 ascending
.ScanPair(firstTuple => (Item2Desc: true, LastValidItem2: firstTuple.Item2), // set initial state (Is Item2 < previous valid Item2?, Last Valid Item2)
(state, cur) => cur.Item2 < state.Key.LastValidItem2 ? (true, cur.Item2) // if still descending, accept Tuple and remember new Item2
: (false, state.Key.LastValidItem2)) // reject Tuple and remember last valid Item2
.Where(statekv => statekv.Key.Item2Desc) // filter out invalid Tuples
.Select(statekv => statekv.Value); // return just the Tuples
Related
Hello I have a List<Tuple<int, int>> and I want to check if there are repeated elements no matter the order. So, for example, if my list contains
List<Tuple<int, int>> tuple = new List<Tuple<int, int>>()
{
new Tuple<int, int>(1, 2),
new Tuple<int, int>(2, 1),
new Tuple<int, int>(3, 2)
};
I want to remove the the second item because it contains the same elements that the first but in reverse order (1,2) and (2,1).
What would be the most efficient way to do it?
var set = new HashSet<long>();
var unique = tuple.Where(t => set.Add((long)Math.Max(t.Item1, t.Item2) << 32 | Math.Min(t.Item1, t.Item2)))
If you are not iterating once then add .ToList() at the end
update
to remove from the original list
var set = new HashSet<long>();
for (int i = tuple.Count -1; i >= 0; i--)
{
if (!set.Add((long)Math.Max(t.Item1, t.Item2) << 32 | Math.Min(t.Item1, t.Item2)))
tuple.RemoveAt(i);
}
You can use DistinctBy function:
var withoutDuplicates = tuple
.DistinctBy(t => Tuple.Create(Math.Min(t.Item1, t.Item2), Math.Max(t.Item1, t.Item2)))
.ToList();
You can define an IEqualityComparer and then you can use Linq's Distinct functionality:
var withoutDuplicate = tuple.Distinct(new IntegerTupleComparer()).ToList();
And here is a naive implementaion of the IEqualityComparer<Tuple<int, int>>
public class IntegerTupleComparer: IEqualityComparer<Tuple<int, int>>
{
public bool Equals(Tuple<int, int> lhs, Tuple<int, int> rhs)
{
if (lhs == null && rhs == null)
return true;
if (lhs == null || rhs == null)
return false;
return GetHashCode(rhs).Equals(GetHashCode(lhs));
}
public int GetHashCode(Tuple<int, int> _)
=> _.Item1.GetHashCode() + _.Item2.GetHashCode();
}
Working example on dotnet fiddle
I need to retain all the list that are redundant and not incremental. But my code so far is for the items that are redundant only
_lst.Add(new MSheetValue
{
Column = 1,
Line = "1",
Pdf = "PDF1"
});
_lst.Add(new MSheetValue
{
Column = 1,
Line = "1",
Pdf = "PDF1"
});
_lst.Add(new MSheetValue
{
Column = 1,
Line = "1",
Pdf = "PDF2"
});
_lst.Add(new MSheetValue
{
Column = 1,
Line = "2",
Pdf = "PDF2"
});
_lst.Add(new MSheetValue
{
Column = 1,
Line = "3",
Pdf = "PDF2"
});
_lst.Add(new MSheetValue
{
Column = 1,
Line = "1",
Pdf = "PDF3"
});
_lst.Add(new MSheetValue
{
Column = 1,
Line = "3",
Pdf = "PDF3"
});
Here is my code
var result = _lst.GroupBy(x => new { x.Line, x.Pdf })
.Where(x => x.Skip(1).Any()).ToList();
and the result is
Column = 1,
Line = "1",
Pdf = "PDF1"
But i also need the list that are not incremental
so i also need this
Column = 1,
Line = "1",
Pdf = "PDF3"
Column = 1,
Line = "3",
Pdf = "PDF3"
How can i solve it. I tried searching for a solution and test what i've found but i can't solve it. it doesn't return what i expected
var distinctItems = _lst.Distinct();
To match on only some of the properties, create a custom equality comparer, e.g.:
class DistinctItemComparer : IEqualityComparer<Item> {
public bool Equals(Item x, Item y) {
return x.Column == y.Column &&
x.Line == y.Line &&
x.Pdf == y.Pdf;
}
public int GetHashCode(Item obj) {
return obj.Column.GetHashCode() ^
obj.Line.GetHashCode() ^
obj.Pdf.GetHashCode();
}
}
Then use it like this:
var distinctItems = _lst.Distinct(new DistinctItemComparer());
Or try it:
var distinctItems = _lst.GroupBy(x => x.Id).Select(y => y.First());
using zip to get the adjacent items and then comparing the adjacent items and selecting the items that are not adjacent may do the trick. This example is a little oversimplified as you may want to compare the field with Pdfs as well. The Union adds the duplicates to the non-adjacents.
return _lst.Zip(_lst.Skip(1), (a, b) => new { a, b})
.Where(w => w.b.Line != w.a.Line + 1)
.Select(w => w.b)
.Union(_lst.GroupBy(x => new { x.Line, x.Pdf })
.Where(x => x.Skip(1).Any()).ToList()
.SelectMany(s => s));
Using some handy extension methods:
public static class Ext {
public static IEnumerable<(TKey Key, T Value)> ScanPair<T, TKey>(this IEnumerable<T> src, TKey seedKey, Func<(TKey Key, T Value), T, TKey> combine) {
using (var srce = src.GetEnumerator()) {
if (srce.MoveNext()) {
var prevkv = (seedKey, srce.Current);
while (srce.MoveNext()) {
yield return prevkv;
prevkv = (combine(prevkv, srce.Current), srce.Current);
}
yield return prevkv;
}
}
}
public static IEnumerable<IGrouping<int, TRes>> GroupByWhile<T, TRes>(this IEnumerable<T> src, Func<T, T, bool> test, Func<T, TRes> result) =>
src.ScanPair(1, (kvp, cur) => test(kvp.Value, cur) ? kvp.Key : kvp.Key + 1).GroupBy(kvp => kvp.Key, kvp => result(kvp.Value));
public static IEnumerable<IGrouping<int, TRes>> GroupBySequential<T, TRes>(this IEnumerable<T> src, Func<T, int> SeqNum, Func<T, TRes> result) =>
src.GroupByWhile((prev, cur) => SeqNum(prev) + 1 == SeqNum(cur), result);
public static IEnumerable<IGrouping<int, T>> GroupBySequential<T>(this IEnumerable<T> src, Func<T, int> SeqNum) => src.GroupBySequential(SeqNum, e => e);
public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> src, Func<T, TKey> keyFun, IEqualityComparer<TKey> comparer = null) {
var seenKeys = new HashSet<TKey>(comparer);
foreach (var e in src)
if (seenKeys.Add(keyFun(e)))
yield return e;
}
public static int ToInteger(this string s) => Convert.ToInt32(s);
}
ScanPair is a variation of my APL inspired Scan operator (which is like Aggregate only returns the intermediate results). I discovered I was doing a lot of Scan with tuples to carry the original information, so ScanPair combines the intermediate results with the original values.
Using ScanPair, GroupByWhile runs a test on each element and groups while the test is true.
Using GroupByWhile, GroupBySequential groups when each elements sequence number is sequential.
DistinctBy returns the distinct objects based on a key selection function. I cheat and use this rather than create an IEqualityComparer for MSheetValue.
Finally, ToInteger is just a handy extension for reading flow.
With these extension methods, processing the _lst is relatively straightforward:
var nonSeq = _lst.GroupBy(m => m.Pdf) // need to test each Pdf
.Select(mg => mg.GroupBySequential(m => m.Line.ToInteger())) // get the sequential groups
.Where(mg => mg.Count() > 1) // keep the ones with non-sequential lines
// parse each non-sequential group into just the unique entries and flatten
.Select(mg => mg.SelectMany(m => m).DistinctBy(m => new { m.Column, m.Line, m.Pdf }));
How can I join 2 lists of different lengths. it should join with the sequence.
Eg.
{1,2,3,4} with {5,6,7}
I need to get result like below.
{{1,5}, {2,6}, {3,7}, {4,null}}
I tried this.
var qry = a.Select((i, index) => new {i, j = b[index]});
But its throwing error since the lists are having different lengths.
Please help me to get the solution.
This should work:
var a = new int?[] { 1, 2, 3, 4 };
var b = new int?[] { 5, 6, 7 };
var result = Enumerable.Range(0, Math.Max(a.Count(), b.Count()))
.Select(n => new[] {a.ElementAtOrDefault(n), b.ElementAtOrDefault(n)});
Do note the ? in the array declarations. That is necessary in order to have null values in the resulting list. Omitting the ? causes the result to have 0 instead of null.
If you can't or don't want to declare the arrays as int?, then you'll have to do the cast in the Select like so:
var result = Enumerable.Range(0, Math.Max(a.Count(), b.Count()))
.Select(n => new[] { a.Select(i => (int?)i).ElementAtOrDefault(n), b.Select(i => (int?)i).ElementAtOrDefault(n) });
This second bit of code will work correctly with regular int arrays or Lists.
The ugly but working version is the following:
a.Cast<int?>().Concat(Enumerable.Repeat<int?>(null, Math.Max(b.Count() - a.Count(), 0)))
.Zip(b.Cast<int?>()
.Concat(Enumerable.Repeat<int?>(null, Math.Max(a.Count() - b.Count(), 0))),
(x, y) => new { x, y });
Its drawback it double evaluation of a collection (the first one is by calling .Count()).
So it is better just to write an extension
static IEnumerable<TResult> ZipNull<T1, T2, TResult>(this IEnumerable<T1> a, IEnumerable<T2> b, Func<T1?, T2?, TResult> func)
where T1 : struct
where T2 : struct
{
using (var it1 = a.GetEnumerator())
using (var it2 = b.GetEnumerator())
{
while (true)
{
if (it1.MoveNext())
{
if (it2.MoveNext())
{
yield return func(it1.Current, it2.Current);
}
else
{
yield return func(it1.Current, null);
}
}
else
{
if (it2.MoveNext())
{
yield return func(null, it2.Current);
}
else
{
break;
}
}
}
}
}
and use it as
a.ZipNull(b, (x, y) => new { x, y });
What you have is effectively a Zip, but where it zips to the end of the longer, rather than the shorter, of the two sequences. You can write such a Zip method, with something that looks a bit similar to the actual Zip implementation:
public static IEnumerable<TResult> ZipAll<TSource, TSecond, TResult>(this IEnumerable<TSource> source,
IEnumerable<TSecond> other,
Func<TSource, TSecond, TResult> projection)
{
using (var firstIterator = source.GetEnumerator())
using (var secondIterator = other.GetEnumerator())
{
while (true)
{
bool hasFirst = firstIterator.MoveNext();
bool hasSecond = secondIterator.MoveNext();
TSource first = hasFirst ? firstIterator.Current : default(TSource);
TSecond second = hasSecond ? secondIterator.Current : default(TSecond);
if (hasFirst || hasSecond)
yield return projection(first, second);
else
yield break;
}
}
}
With that you can write:
a.ZipAll(b, (i, j) => new { i, j });
You could make the code a bit shorter by requiring the inputs to be lists, but the code wouldn't be any faster as lists, just less typing, and it's not like it's that much extra work to support any sequence, so I'd say it's worth the added few lines of code.
Simply loop through the lists and construct new, let's say Dictionary<int?, int?> out of each list element:
var theFirstList = new List<int?> { 1, 2, 3, 4 };
var theSecondList = new List<int?> { 5, 6, 7 };
var el = new Dictionary<int?, int?>();
var length = Math.Max(theFirstList.Count, theSecondList.Count);
for (int i = 0; i < length; i++)
{
el.Add(theFirstList.ElementAtOrDefault(i), theSecondList.ElementAtOrDefault(i));
}
var x = new[] { 1, 2, 3, 4 }.ToList();
var y = new[] { 5, 6, 7 }.ToList();
var arrayLists = new[] {x, y}.OrderBy(t => t.Count).ToList();
var result = arrayLists
.Last()
.Select((item, i) => new[] { x[i], i < arrayLists.First().Count ? y[i] : (int?)null })
.ToList();
this should work for any IEnumerable
I have dictionary and I need get duplicate values.
For example:
Dictionary<int, List<string>> dictionary = new Dictionary<int, List<string>>();
List<string> list1 = new List<string> { "John", "Smith" };
List<string> list2 = new List<string> { "John", "Smith" };
List<string> list3 = new List<string> { "Mike", "Johnson" };
dictionary.Add(1, list1);
dictionary.Add(2, list2);
dictionary.Add(3, list3);
I need find all duplicate from dictionary and return max keys(collection of key) of each duplicate values. From my test dictionary I need return list with only one key = 2
Maybe I chose the wrong data structure. I would like to receive optimal algorithm
With your current structure, you're in a bit of trouble because you don't necessarily have an easy way to compare two List<string> to see if they are equal.
One way to work around this is to create a custom List<string> comparer that implements IEqualityComparer<List<string>>. However, since you have a list of strings, we also need to reorder both to ensure that we are comparing each value in the correct order. This affects the cost of your algorithm. On the other hand, if you are happy with the order of the values inside of the lists, that works just fine as well and you can avoid that cost.
public class StringListComparer : IEqualityComparer<List<string>>
{
public bool Equals(List<string> x, List<string> y)
{
return CompareLists(x, y);
}
public int GetHashCode(List<string> obj)
{
return base.GetHashCode();
}
private static bool CompareLists(List<string> x, List<string> y)
{
if (x.Count != y.Count)
return false;
// we HAVE to ensure that lists are in same order
// for a proper comparison
x = x.OrderBy(v => v).ToList();
y = y.OrderBy(v => v).ToList();
for (var i = 0; i < x.Count(); i++)
{
if (x[i] != y[i])
return false;
}
return true;
}
}
Once we have our comparer, we can use it to pull out keys from subsequent duplicates, leaving the first key (per your requirement).
public List<int> GetDuplicateKeys(Dictionary<int, List<string>> dictionary)
{
return dictionary
.OrderBy (x => x.Key)
.GroupBy(x => x.Value, new StringListComparer())
.Where (x => x.Count () > 1)
.Aggregate (
new List<int>(),
(destination, dict) =>
{
var first = dict.FirstOrDefault();
foreach (var kvp in dict)
{
if (!kvp.Equals(first))
destination.Add(kvp.Key);
}
return destination;
}
).ToList();
}
The following test outputs keys 2 and 4.
Dictionary<int, List<string>> dictionary = new Dictionary<int, List<string>>();
dictionary.Add(1, new List<string> { "John", "Smith" });
dictionary.Add(2, new List<string> { "John", "Smith" });
dictionary.Add(3, new List<string> { "Mike", "Johnson"});
dictionary.Add(4, new List<string> { "John", "Smith" });
var result = GetDuplicateKeys(dictionary);
You could create a list of KeyValuePair<int, List<string>>, that contains the ordered lists, with the outer list sorted. Then you could find duplicates very quickly. You'd need a list comparer that can compare ordered lists.
class MyListComparer: Comparer<List<string>>
{
public override int Compare(List<string> x, List<string> y)
{
for (var ix = 0; ix < x.Count && ix < y.Count; ++ix)
{
var rslt = x[ix].CompareTo(y[ix]);
if (rslt != 0)
{
return rslt;
}
}
// exhausted one of the lists.
// Compare the lengths.
return x.Count.CompareTo(y.Count);
}
}
var comparer = new MyListComparer();
var sortedList = dictionary.Select(kvp =>
new KeyValuePair<int, List<string>>(kvp.Key, kvp.Value.OrderBy(v => v))
.OrderBy(kvp => kvp.Value, comparer)
.ThenBy(kvp => kvp.Key);
Note the ThenBy, which ensures that if two lists are equal, the one with the smaller key will appear first. This is necessary because, although OrderBy does a stable sort, there's no guarantee that enumerating the dictionary returned items in order by key.
// the lists are now sorted by value. So `"Smith, John"` will appear before `"Smith, William"`.
// We can go through the list sequentially to output duplicates.
var previousList = new List<string>();
foreach (var kvp in sortedList)
{
if (kvp.Value.SequenceEqual(previousList))
{
// this list is a duplicate
// Lookup the list using the key.
var dup = dictionary[kvp.Key];
// Do whatever you need to do with the dup
}
else
{
previousList = kvp.Value;
}
}
This sorts each list only once. It does use more memory, because it duplicates the dictionary in that list of KeyValuePair<int, List<string>>, but for larger data sets it should be much faster than sorting each list multiple times and comparing it against every other list.
Caveats: The code above assumes that none of the lists are null, and none of them are empty. If a list in the dictionary can be null or empty, then you'll have to add some special case code. But the general approach would be the same.
Is there an elegant way of converting this string array:
string[] a = new[] {"name", "Fred", "colour", "green", "sport", "tennis"};
into a Dictionary such that every two successive elements of the array become one {key, value} pair of the dictionary (I mean {"name" -> "Fred", "colour" -> "green", "sport" -> "tennis"})?
I can do it easily with a loop, but is there a more elegant way, perhaps using LINQ?
var dict = a.Select((s, i) => new { s, i })
.GroupBy(x => x.i / 2)
.ToDictionary(g => g.First().s, g => g.Last().s);
Since it's an array I would do this:
var result = Enumerable.Range(0,a.Length/2)
.ToDictionary(x => a[2 * x], x => a[2 * x + 1]);
How about this ?
var q = a.Zip(a.Skip(1), (Key, Value) => new { Key, Value })
.Where((pair,index) => index % 2 == 0)
.ToDictionary(pair => pair.Key, pair => pair.Value);
I've made a simular method to handle this type of request. But since your array contains both keys and values i think you need to split this first.
Then you can use something like this to combine them
public static IDictionary<T, T2> ZipMyTwoListToDictionary<T, T2>(IEnumerable<T> listContainingKeys, IEnumerable<T2> listContainingValue)
{
return listContainingValue.Zip(listContainingKeys, (value, key) => new { value, key }).ToDictionary(i => i.key, i => i.value);
}
a.Select((input, index) = >new {index})
.Where(x=>x.index%2!=0)
.ToDictionary(x => a[x.index], x => a[x.index+1])
I would recommend using a for loop but I have answered as requested by you.. This is by no means neater/cleaner..
public static IEnumerable<T> EveryOther<T>(this IEnumerable<T> source)
{
bool shouldReturn = true;
foreach (T item in source)
{
if (shouldReturn)
yield return item;
shouldReturn = !shouldReturn;
}
}
public static Dictionary<T, T> MakeDictionary<T>(IEnumerable<T> source)
{
return source.EveryOther()
.Zip(source.Skip(1).EveryOther(), (a, b) => new { Key = a, Value = b })
.ToDictionary(pair => pair.Key, pair => pair.Value);
}
The way this is set up, and because of the way Zip works, if there are an odd number of items in the list the last item will be ignored, rather than generation some sort of exception.
Note: derived from this answer.
IEnumerable<string> strArray = new string[] { "name", "Fred", "colour", "green", "sport", "tennis" };
var even = strArray.ToList().Where((c, i) => (i % 2 == 0)).ToList();
var odd = strArray.ToList().Where((c, i) => (i % 2 != 0)).ToList();
Dictionary<string, string> dict = even.ToDictionary(x => x, x => odd[even.IndexOf(x)]);