Get duplicate values from dictionary - c#

I have dictionary and I need get duplicate values.
For example:
Dictionary<int, List<string>> dictionary = new Dictionary<int, List<string>>();
List<string> list1 = new List<string> { "John", "Smith" };
List<string> list2 = new List<string> { "John", "Smith" };
List<string> list3 = new List<string> { "Mike", "Johnson" };
dictionary.Add(1, list1);
dictionary.Add(2, list2);
dictionary.Add(3, list3);
I need find all duplicate from dictionary and return max keys(collection of key) of each duplicate values. From my test dictionary I need return list with only one key = 2
Maybe I chose the wrong data structure. I would like to receive optimal algorithm

With your current structure, you're in a bit of trouble because you don't necessarily have an easy way to compare two List<string> to see if they are equal.
One way to work around this is to create a custom List<string> comparer that implements IEqualityComparer<List<string>>. However, since you have a list of strings, we also need to reorder both to ensure that we are comparing each value in the correct order. This affects the cost of your algorithm. On the other hand, if you are happy with the order of the values inside of the lists, that works just fine as well and you can avoid that cost.
public class StringListComparer : IEqualityComparer<List<string>>
{
public bool Equals(List<string> x, List<string> y)
{
return CompareLists(x, y);
}
public int GetHashCode(List<string> obj)
{
return base.GetHashCode();
}
private static bool CompareLists(List<string> x, List<string> y)
{
if (x.Count != y.Count)
return false;
// we HAVE to ensure that lists are in same order
// for a proper comparison
x = x.OrderBy(v => v).ToList();
y = y.OrderBy(v => v).ToList();
for (var i = 0; i < x.Count(); i++)
{
if (x[i] != y[i])
return false;
}
return true;
}
}
Once we have our comparer, we can use it to pull out keys from subsequent duplicates, leaving the first key (per your requirement).
public List<int> GetDuplicateKeys(Dictionary<int, List<string>> dictionary)
{
return dictionary
.OrderBy (x => x.Key)
.GroupBy(x => x.Value, new StringListComparer())
.Where (x => x.Count () > 1)
.Aggregate (
new List<int>(),
(destination, dict) =>
{
var first = dict.FirstOrDefault();
foreach (var kvp in dict)
{
if (!kvp.Equals(first))
destination.Add(kvp.Key);
}
return destination;
}
).ToList();
}
The following test outputs keys 2 and 4.
Dictionary<int, List<string>> dictionary = new Dictionary<int, List<string>>();
dictionary.Add(1, new List<string> { "John", "Smith" });
dictionary.Add(2, new List<string> { "John", "Smith" });
dictionary.Add(3, new List<string> { "Mike", "Johnson"});
dictionary.Add(4, new List<string> { "John", "Smith" });
var result = GetDuplicateKeys(dictionary);

You could create a list of KeyValuePair<int, List<string>>, that contains the ordered lists, with the outer list sorted. Then you could find duplicates very quickly. You'd need a list comparer that can compare ordered lists.
class MyListComparer: Comparer<List<string>>
{
public override int Compare(List<string> x, List<string> y)
{
for (var ix = 0; ix < x.Count && ix < y.Count; ++ix)
{
var rslt = x[ix].CompareTo(y[ix]);
if (rslt != 0)
{
return rslt;
}
}
// exhausted one of the lists.
// Compare the lengths.
return x.Count.CompareTo(y.Count);
}
}
var comparer = new MyListComparer();
var sortedList = dictionary.Select(kvp =>
new KeyValuePair<int, List<string>>(kvp.Key, kvp.Value.OrderBy(v => v))
.OrderBy(kvp => kvp.Value, comparer)
.ThenBy(kvp => kvp.Key);
Note the ThenBy, which ensures that if two lists are equal, the one with the smaller key will appear first. This is necessary because, although OrderBy does a stable sort, there's no guarantee that enumerating the dictionary returned items in order by key.
// the lists are now sorted by value. So `"Smith, John"` will appear before `"Smith, William"`.
// We can go through the list sequentially to output duplicates.
var previousList = new List<string>();
foreach (var kvp in sortedList)
{
if (kvp.Value.SequenceEqual(previousList))
{
// this list is a duplicate
// Lookup the list using the key.
var dup = dictionary[kvp.Key];
// Do whatever you need to do with the dup
}
else
{
previousList = kvp.Value;
}
}
This sorts each list only once. It does use more memory, because it duplicates the dictionary in that list of KeyValuePair<int, List<string>>, but for larger data sets it should be much faster than sorting each list multiple times and comparing it against every other list.
Caveats: The code above assumes that none of the lists are null, and none of them are empty. If a list in the dictionary can be null or empty, then you'll have to add some special case code. But the general approach would be the same.

Related

helpd need with sorted tuple and linq query

I've writen a little test case to explain my issue.
I'm somehow able to query my DB to get a list of list of tuple.
From which I want to extract a list of tuple, with no duplicate, ordered by Item1 ... which is fine, but now I always want to remove tuple when Item2 is not sorted in descending order.
I was able to do this by creating a temporary list and then removing bad tuples.
Could you please help me do to this directly in linq (if possible ?) ?
using System;
using System.Collections.Generic;
using System.Linq;
using NUnit.Framework;
namespace Web.Test
{
[TestFixture]
public class ListListTupleTest
{
[TestCase]
public void TestCaseTest_1()
{
var input = new List<List<Tuple<int, decimal>>>
{
new List<Tuple<int, decimal>>
{
new Tuple<int, decimal>(5, 20),
new Tuple<int, decimal>(8, 10)
},
new List<Tuple<int, decimal>>
{
new Tuple<int, decimal>(7, 17),
new Tuple<int, decimal>(12, 9)
},
new List<Tuple<int, decimal>>
{
new Tuple<int, decimal>(7, 17),
new Tuple<int, decimal>(15, 10)
}
};
var goal = new List<Tuple<int, decimal>>()
{
new Tuple<int, decimal>(5, 20),
new Tuple<int, decimal>(7, 17),
new Tuple<int, decimal>(8, 10),
new Tuple<int, decimal>(12, 9)
};
var result = myFunction(input);
CollectionAssert.AreEqual(result, goal);
}
private List<Tuple<int, decimal>> myFunction(List<List<Tuple<int, decimal>>> myList)
{
var tmp = myList
.SelectMany(x => x.ToArray())
.Distinct()
.OrderBy(x => x.Item1)
.ToList();
var result = new List<Tuple<int, decimal>>();
if (tmp.Any())
{
result.Add(tmp.First());
decimal current = tmp.First().Item2;
foreach (var tuple in tmp.Skip(1))
{
if (tuple.Item2 < current)
{
result.Add(tuple);
current = tuple.Item2;
}
}
}
return result;
}
}
}
I agree with others that loop might be a best solution here, but if you really really want to use LINQ, you can use Aggregate like this:
return myList
.SelectMany(x => x.ToArray())
.Distinct()
.OrderBy(x => x.Item1)
.Aggregate(Enumerable.Empty<Tuple<int, decimal>>(),
(acc, value) => value.Item2 > acc.LastOrDefault()?.Item2 ?
acc :
acc.Concat(new[] {value}))
.ToList();
This basically replicates your loop: we start with empty set (Enumerable.Empty<Tuple<int, decimal>>()) and then aggregate gives values one by one to our callback. There we either return previous set as is, or adding current item to it, depending on Item2 comparision.
You can also use List as accumulator instead of Enumerable.Empty:
return myList
.SelectMany(x => x.ToArray())
.Distinct()
.OrderBy(x => x.Item1)
.Aggregate(new List<Tuple<int, decimal>>(),
(acc, value) =>
{
var last = acc.Count > 0 ? acc[acc.Count - 1] : null;
if (last == null || value.Item2 < last.Item2)
acc.Add(value);
return acc;
}); // ToList is not needed - already a list
To use LINQ for this, I use a special extension method that is based on the APL scan operator - it is like Aggregate, but returns all the intermediate results. In this case, I use a special variation that automatically pairs results with original data in a ValueTuple, and initializes the state with a Func on the first value:
public static IEnumerable<(TKey Key, T Value)> ScanPair<T, TKey>(this IEnumerable<T> src, Func<T, TKey> fnSeed, Func<(TKey Key, T Value), T, TKey> combine) {
using (var srce = src.GetEnumerator()) {
if (srce.MoveNext()) {
var seed = (fnSeed(srce.Current), srce.Current);
while (srce.MoveNext()) {
yield return seed;
seed = (combine(seed, srce.Current), srce.Current);
}
yield return seed;
}
}
}
Now it is relatively straight forward to compute your result - you do it pretty much like you state:
var ans = input.SelectMany(sub => sub, (l, s) => s) // flatten lists to one list
.Distinct() // keep only distinct tuples
.OrderBy(s => s.Item1) // sort by Item1 ascending
.ScanPair(firstTuple => (Item2Desc: true, LastValidItem2: firstTuple.Item2), // set initial state (Is Item2 < previous valid Item2?, Last Valid Item2)
(state, cur) => cur.Item2 < state.Key.LastValidItem2 ? (true, cur.Item2) // if still descending, accept Tuple and remember new Item2
: (false, state.Key.LastValidItem2)) // reject Tuple and remember last valid Item2
.Where(statekv => statekv.Key.Item2Desc) // filter out invalid Tuples
.Select(statekv => statekv.Value); // return just the Tuples

Merge intersecting items in a list of lists

I have list of lists where i would like to merge all lists which contain identical values and make a new list out of the merged lists using linq. Here's and example:
var animalGroups = new List<List<Animal>>{
new List<Animal>{lizard,cat,cow,dog},
new List<Animal>{horse, chicken, pig, turkey},
new List<Animal>{ferret,duck,cat,parrot},
new List<Animal>{chicken,sheep,horse,rabbit}
};
The desired output would be a new List<List<animal>> containing the following List<Animal>:
{lizard, cat, cow, dog, ferret, duck, parrot}
{horse, chicken, pig, turkey, sheep, rabbit}
I'm rather new to linq and i got stuck at grouping the intersected lists without creating duplicates.
Here is the possible output with list of strings
var animalGroups = new List<List<string>>
{
new List<string> {"lizard", "cat", "cow", "dog"},
new List<string> {"horse", "chicken", "pig", "turkey"},
new List<string> {"ferret", "duck", "cat", "parrot"},
new List<string> {"chicken", "sheep", "horse", "rabbit"}
};
List<List<string>> mergedList = new List<List<string>>();
for (int i = 0; i < animalGroups.Count; i++)
{
for (int j = i+1; j < animalGroups.Count; j++)
{
if (animalGroups[i].Intersect(animalGroups[j]).Any())
{
mergedList.Add(animalGroups[i].Concat(animalGroups[j]).Distinct().ToList());
}
}
}
First, remember to override Equals and GetHahCode and/or implement IEquatable<Animal> in your Anymial class meaningfully(f.e. by comparing the Name).
List<IEnumerable<Animal>> mergedLists = animalGroups.MergeIntersectingLists().ToList();
Following extension method used which works with any type:
public static IEnumerable<IEnumerable<T>> MergeIntersectingLists<T>(this IEnumerable<IEnumerable<T>> itemLists, IEqualityComparer<T> comparer = null)
{
if (comparer == null) comparer = EqualityComparer<T>.Default;
var itemListDict = new Dictionary<T, HashSet<T>>(comparer);
foreach (IEnumerable<T> sequence in itemLists)
{
IList<T> list = sequence as IList<T> ?? sequence.ToList();
HashSet<T> itemStorage = null;
list.FirstOrDefault(i => itemListDict.TryGetValue(i, out itemStorage));
// FirstOrDefault will initialize the itemStorage because its an out-parameter
bool partOfListIsContainedInOther = itemStorage != null;
if (partOfListIsContainedInOther)
{
// add this list to the other storage (a HashSet that removes duplicates)
foreach (T item in list)
itemStorage.Add(item);
}
else
{
itemStorage = new HashSet<T>(list, comparer);
// each items needs to be added to the dictionary, all have the same storage
foreach (T item in itemStorage)
itemListDict.Add(item, itemStorage); // same storage for all
}
}
// Distinct removes duplicate HashSets because of reference equality
// needed because item was the key and it's storage the value
// and those HashSets are the same reference
return itemListDict.Values.Distinct();
}
Your question is vague one; in case you want to combine 0, 2, 4, ... 2n as well as 1, 3, 5, ... 2n - 1 lists and you are looking for Linq solution:
// I don't have Animal class, that's why I've put string
// Be sure that Animal implements Equals as well as GetHashCode methods
var animalGroups = new List<List<string>> {
new List<string> {"lizard", "cat", "cow", "dog"},
new List<string> {"horse", "chicken", "pig", "turkey"},
new List<string> {"ferret", "duck", "cat", "parrot"},
new List<string> {"chicken", "sheep", "horse", "rabbit"}
};
var result = animalGroups
.Select((list, index) => new {
list = list,
index = index, })
.GroupBy(item => item.index % 2, // grouping 0, 2, ... 2n as well as 1, 3,... 2n - 1
item => item.list)
.Select(chunk => chunk
.SelectMany(c => c)
.Distinct()
.ToList())
.ToList();
Let's visualize the result:
string test = string.Join(Environment.NewLine, result
.Select(list => string.Join(", ", list)));
Console.WritelLine(test);
Outcome
lizard, cat, cow, dog, ferret, duck, parrot
horse, chicken, pig, turkey, sheep, rabbit

how to find members that exist in at least two lists in a list of lists

I have an array of lists:
var stringLists = new List<string>[]
{
new List<string>(){ "a", "b", "c" },
new List<string>(){ "d", "b", "c" },
new List<string>(){ "a", "d", "c" }
};
I want to extract all elements that are common in at least 2 lists. So for this example, I should get all elements ["a", "b", "c", "d"]. I know how to find elements common to all but couldn't think of any way to solve this problem.
You could use something like this:
var result = stringLists.SelectMany(l => l.Distinct())
.GroupBy(e => e)
.Where(g => g.Count() >= 2)
.Select(g => g.Key);
Just for fun some iterative solutions:
var seen = new HashSet<string>();
var current = new HashSet<string>();
var result = new HashSet<string>();
foreach (var list in stringLists)
{
foreach(var element in list)
if(current.Add(element) && !seen.Add(element))
result.Add(element);
current.Clear();
}
or:
var already_seen = new Dictionary<string, bool>();
foreach(var list in stringLists)
foreach(var element in list.Distinct())
already_seen[element] = already_seen.ContainsKey(element);
var result = already_seen.Where(kvp => kvp.Value).Select(kvp => kvp.Key);
or (inspired by Tim's answer):
int tmp;
var items = new Dictionary<string,int>();
foreach(var str in stringLists.SelectMany(l => l.Distinct()))
{
items.TryGetValue(str, out tmp);
items[str] = tmp + 1;
}
var result = items.Where(kv => kv.Value >= 2).Select(kv => kv.Key);
You could use a Dictionary<string, int>, the key is the string and the value is the count:
Dictionary<string, int> itemCounts = new Dictionary<string,int>();
for(int i = 0; i < stringLists.Length; i++)
{
List<string> list = stringLists[i];
foreach(string str in list.Distinct())
{
if(itemCounts.ContainsKey(str))
itemCounts[str] += 1;
else
itemCounts.Add(str, 1);
}
}
var result = itemCounts.Where(kv => kv.Value >= 2);
I use list.Distinct() since you only want to count occurences in different lists.
As requested, here is an extension method which you can reuse with any type:
public static IEnumerable<T> GetItemsWhichOccurAtLeastIn<T>(this IEnumerable<IEnumerable<T>> seq, int minCount, IEqualityComparer<T> comparer = null)
{
if (comparer == null) comparer = EqualityComparer<T>.Default;
Dictionary<T, int> itemCounts = new Dictionary<T, int>(comparer);
foreach (IEnumerable<T> subSeq in seq)
{
foreach (T x in subSeq.Distinct(comparer))
{
if (itemCounts.ContainsKey(x))
itemCounts[x] += 1;
else
itemCounts.Add(x, 1);
}
}
foreach(var kv in itemCounts.Where(kv => kv.Value >= minCount))
yield return kv.Key;
}
Usage is simple:
string result = String.Join(",", stringLists.GetItemsWhichOccurAtLeastIn(2)); // a,b,c,d
Follow these steps:
Create a Dictionary element -> List of indices
loop over all lists
for list number i: foreach element in the list: add i to the list in the dictionary at position : dictionary[element].Add(i) (if not already present)
Count how many lists in the dictionary have two entries
You can use SelectMany to flatten the list and then pick all elemeents which occur twice or more:
var singleList = stringLists.SelectMany(p => p);
var results = singleList.Where(p => singleList.Count(q => p == q) >= 2).Distinct();

How to convert a String[] to an IDictionary<String, String>?

How to convert a String[] to an IDictionary<String, String>?
The values at the indices 0,2,4,... shall be keys, and consequently values at the indices 1,3,5,... shall be values.
Example:
new[] { "^BI", "connectORCL", "^CR", "connectCR" }
=>
new Dictionary<String, String> {{"^BI", "connectORCL"}, {"^CR", "connectCR"}};
I'd recommend a good old for loop for clarity. But if you insist on a LINQ query, this should work:
var dictionary = Enumerable.Range(0, array.Length/2)
.ToDictionary(i => array[2*i], i => array[2*i+1])
Dictionary<string,string> ArrayToDict(string[] arr)
{
if(arr.Length%2!=0)
throw new ArgumentException("Array doesn't contain an even number of entries");
Dictionary<string,string> dict=new Dictionary<string,string>();
for(int i=0;i<arr.Length/2;i++)
{
string key=arr[2*i];
string value=arr[2*i+1];
dict.Add(key,value);
}
return dict;
}
There's really no easy way to do this in LINQ (And even if there were, it's certainly not going to be clear as to the intent). It's easily accomplished by a simple loop though:
// This code assumes you can guarantee your array to always have an even number
// of elements.
var array = new[] { "^BI", "connectORCL", "^CR", "connectCR" };
var dict = new Dictionary<string, string>();
for(int i=0; i < array.Length; i+=2)
{
dict.Add(array[i], array[i+1]);
}
Something like this maybe:
string[] keyValues = new string[20];
Dictionary<string, string> dict = new Dictionary<string, string>();
for (int i = 0; i < keyValues.Length; i+=2)
{
dict.Add(keyValues[i], keyValues[i + 1]);
}
Edit: People in the C# tag are damn fast...
If you have Rx as a dependency you can do:
strings
.BufferWithCount(2)
.ToDictionary(
buffer => buffer.First(), // key selector
buffer => buffer.Last()); // value selector
BufferWithCount(int count) takes the first count values from the input sequence and yield them as a list, then it takes the next count values and so on. I.e. from your input sequence you will get the pairs as lists: {"^BI", "connectORCL"}, {"^CR", "connectCR"}, the ToDictionary then takes the first list item as key and the last ( == second for lists of two items) as value.
However, if you don't use Rx, you can use this implementation of BufferWithCount:
static class EnumerableX
{
public static IEnumerable<IList<T>> BufferWithCount<T>(this IEnumerable<T> source, int count)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
if (count <= 0)
{
throw new ArgumentOutOfRangeException("count");
}
var buffer = new List<T>();
foreach (var t in source)
{
buffer.Add(t);
if (buffer.Count == count)
{
yield return buffer;
buffer = new List<T>();
}
}
if (buffer.Count > 0)
{
yield return buffer;
}
}
}
It looks like other people have already beaten me to it and/or have more efficient answers but I'm posting 2 ways:
A for loop might be the clearest way to accomplish in this case...
var words = new[] { "^BI", "connectORCL", "^CR", "connectCR" };
var final = words.Where((w, i) => i % 2 == 0)
.Select((w, i) => new[] { w, words[(i * 2) + 1] })
.ToDictionary(arr => arr[0], arr => arr[1])
;
final.Dump();
//alternate way using zip
var As = words.Where((w, i) => i % 2 == 0);
var Bs = words.Where((w, i) => i % 2 == 1);
var dictionary = new Dictionary<string, string>(As.Count());
var pairs = As.Zip(Bs, (first, second) => new[] {first, second})
.ToDictionary(arr => arr[0], arr => arr[1])
;
pairs.Dump();
FYI, this is what I ended up with using a loop and implementing it as an extension method:
internal static Boolean IsEven(this Int32 #this)
{
return #this % 2 == 0;
}
internal static IDictionary<String, String> ToDictionary(this String[] #this)
{
if (!#this.Length.IsEven())
throw new ArgumentException( "Array doesn't contain an even number of entries" );
var dictionary = new Dictionary<String, String>();
for (var i = 0; i < #this.Length; i += 2)
{
var key = #this[i];
var value = #this[i + 1];
dictionary.Add(key, value);
}
return dictionary;
}
Pure Linq
Select : Project original string value and its index.
GroupBy : Group adjacent pairs.
Convert each group into dictionary entry.
string[] arr = new string[] { "^BI", "connectORCL", "^CR", "connectCR" };
var dictionary = arr.Select((value,i) => new {Value = value,Index = i})
.GroupBy(value => value.Index / 2)
.ToDictionary(g => g.FirstOrDefault().Value,
g => g.Skip(1).FirstOrDefault().Value);

How to get count similar word in list?

I have C# list with lot of similar name i want to count all individual similar word.
Example
Suppose list has these values
one,one,one,two,two,four,four,four
then i want to calculate like this
one 3
two 2
four 3
how can i calculate value like this from list.
I would split the string on comma, loop through all the results and add each word to a hashtable or dictionary with a value of one. If the word (key) is already present, then increment the value.
string[] values = "one,one,one,two,two,four,four,four".Split(',');
var counts = new Dictionary<string, int>();
foreach (string value in values) {
if (counts.ContainsKey(value))
counts[value] = counts[value] + 1;
else
counts.Add(value, 1);
}
Or, if you prefer, here is a LINQ solution
var counts = values.GroupBy<string, string, int>(k => k, e => 1)
.Select(f => new KeyValuePair<string, int>(f.Key, f.Sum()))
.ToDictionary(k => k.Key, e => e.Value);
Here is a solution based on Linq:
string s = "one,one,one,two,two,four,four,four";
List<string> list = s.Split(',').ToList();
Dictionary<string, int> dictionary = list.GroupBy(x => x)
.ToDictionary(x => x.Key, x => x.Count());
foreach (var kvp in dictionary)
Console.WriteLine("{0}: {1}", kvp.Key, kvp.Value);
Output:
one: 3
two: 2
four: 3
This solutions doesn't take advantage of the fact that the common values are consecutive. If this is always the case, a slightly faster solution could be written, but this is fine for short lists, or if the items can come in any order.
Dictionaty<string, int> listCount = new Dictionaty<string, int>();
for (int i = 0; i < yourList.Count; i++)
{
if(listCount.ContainsKey(yourList[i]))
listCount[yourList[i].Trim()] = listCount[yourList[i].Trim()] + 1;
else
listCount[yourList[i].Trim()] = 1;
}
For List, you could do the following (untested):
List<string> list = new List<string>()
{
"One",
"One",
"Two",
// etc
}
Dictionary<string, int> d = new Dictionary<string, int>();
foreach (string s in list)
{
if (d.ContainsKey(s))
d.Add(s, 1);
else
d[s]++;
}
The preferred (and cleaner) method is to do this using GroupBy and Count with Linq, but I don't have the type to type out the syntax at the moment.
Good luck!

Categories

Resources